We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
The Data Janitor returns | Daniel Molnar
Data engineering challenges, machine learning, and the role of data science in business problems. Addressing common pitfalls, metrics, and skills shortage in the industry, with a focus on ETL tooling, data janitor, and more.
- Data is still dirty and has a lot of garbage
 - Authorization and ETL tooling are important in data engineering
 - The biggest problem is probably not enough data, not noise
 - People tend to overengineer, try to do too many things at once
 - You don’t have to have a huge team to do machine learning, but you need one person at least
 - You don’t need a whole company to get started with data science, but having a dedicated team is better
 - Data scientists are not going to solve your business problems, they just help you answer questions
 - MPS (Net Promoter Score) is an important metric to measure customer loyalty
 - A/B testing doesn’t always give accurate results, beware of Simpson’s paradox
 - You can’t always trust data, there are many potential biases
 - There are not enough people in the world who know how to deal with data, not even 0.1% have the skills
 - There are also not enough jobs in data science to go around, not even enough to solve all the problems
 - Business intelligence and data engineering are still quite separate disciplines
 - There are more and more problems with distributed systems
 - The state of data engineering is “okay”, with some exceptions
 - Some projects are just hype, some are solving real problems