Dask DataFrame is fast now - Florian Jetter (Coiled) @ PyData Südwest

Discover how Dask DataFrame has become a fast and robust engine for large-scale data processing, outperforming Spark in many cases, with a new optimizer, efficient shuffle algorithm, and opportunities for community contributions.

Key takeaways

Dask DataFrame is now fast and can compete with other engines like Spark.
Dask is more robust and can handle large scale data processing.
The legacy version of Dask was about optimizing data type and correctly implemented.
The new optimizer in Dask is now default and optimized for performance.
Some features like column projection and predicate pushdown are still up to date.
Dask is better than Spark in many cases.
The new shuffle algorithm in Dask is more efficient.
The performance of Dask can be improved with contributions from the community.
Dask can be used for tasks like argument mining, legal technology, and natural language processing.
Dask has a wide range of applications and can be used for various use cases.
Dask can be used for large scale data processing and can handle big data.

Dask DataFrame is fast now - Florian Jetter (Coiled) @ PyData Südwest

More talks