We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Be kind to yourself! Spend (less) time on Data Exploration - Willem Hendriks | PyData Eindhoven 2021
Learn why data exploration is crucial in machine learning and how to make the most of this often-overlooked step. Discover tips, tools, and strategies to extract insights and improve your models.
- Data exploration is crucial, but often overlooked or rushed in machine learning.
- It’s important to take the time to understand the data, as it can lead to better feature engineering.
- Willem used the example of California housing data, where he went back to the data to understand it better.
- Data exploration can be a creative process, and it’s important to be patient and not rush through it.
- There are many data exploration tools available, and each has its own strengths and weaknesses.
- Willem likes Dabl, SweetVis, and GraphZ for data exploration, but warns that there is no silver bullet and the best tool may depend on the specific problem.
- Data exploration can be used to create features, improve the model, and track experiments.
- It’s important to go back to the data after building a model to understand the errors and improve the model further.
- Data exploration is not always easy and can involve manual labour, but it’s worth it in the end.
- Andrew Ng’s new company is focused on AI and machine learning, and Willem credits him with popularizing the term “feature engineering”.
- Willem suggests using a combination of data exploration tools, such as Dabl and SweetVis, to gain insights into the data.
- Data exploration is not a one-size-fits-all solution, and different approaches may be necessary for different problems.
- Willem likes the idea of using a structured approach to data exploration, but acknowledges that it may not always be possible.
- He also suggests using a cup of coffee to gain insights into the data.
- The most important thing is to be willing to take the time to understand the data and to be patient during the data exploration process.