Too many ideas, too little data | Markus Nutz & Thomas Pawlitzki | ML Conference 2018

Leverage data-driven insights to overcome data scarcity and validate ideas, optimizing model selection, deployment, and communication with stakeholders, while aligning projects with business goals and measuring impact.

Key takeaways
  • Always start by defining a clear problem and setting goals.
  • Data availability is a major challenge, and it’s essential to have a data strategy.
  • Use small, incremental experiments to validate ideas and reduce risk.
  • Agile project management and iterative development can help overcome cold start problems.
  • Use random forests and other ensemble methods to improve model selection and reduce overfitting.
  • Ensure data quality and preprocessing steps are robust and reproducible.
  • Use serverless architectures and cloud-based services to deploy models quickly and efficiently.
  • Consider using open data sources and public datasets to augment internal data.
  • Always have a clear understanding of the problem domain and the data being used to solve it.
  • Use clear and simple language to communicate data science insights to non-technical stakeholders.
  • Divide complex data science projects into smaller, manageable tasks and prioritize them based on business goals.
  • Use data to validate and iterate on business decisions, rather than using intuition or anecdotal evidence.
  • Ensure data science projects are aligned with business objectives and are measurable in terms of impact.
  • Use cloud-based services and developer tools to accelerate experimentation and iteration.