The Data Janitor returns | Daniel Molnar

Daniel Molnar

Data engineering challenges, machine learning, and the role of data science in business problems. Addressing common pitfalls, metrics, and skills shortage in the industry, with a focus on ETL tooling, data janitor, and more.

Key takeaways

Data is still dirty and has a lot of garbage
Authorization and ETL tooling are important in data engineering
The biggest problem is probably not enough data, not noise
People tend to overengineer, try to do too many things at once
You don’t have to have a huge team to do machine learning, but you need one person at least
You don’t need a whole company to get started with data science, but having a dedicated team is better
Data scientists are not going to solve your business problems, they just help you answer questions
MPS (Net Promoter Score) is an important metric to measure customer loyalty
A/B testing doesn’t always give accurate results, beware of Simpson’s paradox
You can’t always trust data, there are many potential biases
There are not enough people in the world who know how to deal with data, not even 0.1% have the skills
There are also not enough jobs in data science to go around, not even enough to solve all the problems
Business intelligence and data engineering are still quite separate disciplines
There are more and more problems with distributed systems
The state of data engineering is “okay”, with some exceptions
Some projects are just hype, some are solving real problems

The Data Janitor returns | Daniel Molnar

More talks