Scale EDA & ML Workloads To Clusters & Back With Dask I PyData Chicago January 2022 Meetup

Python

Discover how to scale EDA and ML workloads to clusters and back using Dask, a parallel computing library for Python, and learn how to speed up computations and simulations for data science and scientific computing applications.

Key takeaways

Dask is a parallel computing library for Python that allows users to scale their code up to clusters.
It can be used to speed up computational tasks by dividing them into smaller ones that can be executed in parallel.
Dask can be distributed to clusters, such as Kubernetes, and is used in various industries, including finance, broadcasting, and medical research.
It’s ideal for tasks that require load-balancing and are I/O bound.
Dask can be used to speed up simulations by splitting them into smaller chunks that can be executed in parallel.
It’s useful for tasks like data processing, data science, and scientific computing.
Dask can be used with Jupyter notebooks and is compatible with popular data science libraries such as Pandas and scikit-learn.
It’s also compatible with other languages, such as R and SQL.
Dask is scalable and can handle large datasets and distributed computing.
Dask can be used to distribute any Python code to a cluster, including data science and scientific computing applications.
It’s used in various domains, including finance, broadcasting, and medical research.
Dask can be used to speed up simulations, data processing, and data science tasks by dividing them into smaller chunks that can be executed in parallel.
It’s useful for tasks like data processing, data science, and scientific computing, and is used in various industries, including finance, broadcasting, and medical research.

Scale EDA & ML Workloads To Clusters & Back With Dask I PyData Chicago January 2022 Meetup

More talks