We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
The Data Janitor returns | Daniel Molnar
Data engineering challenges, machine learning, and the role of data science in business problems. Addressing common pitfalls, metrics, and skills shortage in the industry, with a focus on ETL tooling, data janitor, and more.
- Data is still dirty and has a lot of garbage
- Authorization and ETL tooling are important in data engineering
- The biggest problem is probably not enough data, not noise
- People tend to overengineer, try to do too many things at once
- You don’t have to have a huge team to do machine learning, but you need one person at least
- You don’t need a whole company to get started with data science, but having a dedicated team is better
- Data scientists are not going to solve your business problems, they just help you answer questions
- MPS (Net Promoter Score) is an important metric to measure customer loyalty
- A/B testing doesn’t always give accurate results, beware of Simpson’s paradox
- You can’t always trust data, there are many potential biases
- There are not enough people in the world who know how to deal with data, not even 0.1% have the skills
- There are also not enough jobs in data science to go around, not even enough to solve all the problems
- Business intelligence and data engineering are still quite separate disciplines
- There are more and more problems with distributed systems
- The state of data engineering is “okay”, with some exceptions
- Some projects are just hype, some are solving real problems