Dask DataFrame is fast now - Florian Jetter (Coiled) @ PyData Südwest

Discover how Dask DataFrame has become a fast and robust engine for large-scale data processing, outperforming Spark in many cases, with a new optimizer, efficient shuffle algorithm, and opportunities for community contributions.

Key takeaways
  • Dask DataFrame is now fast and can compete with other engines like Spark.
  • Dask is more robust and can handle large scale data processing.
  • The legacy version of Dask was about optimizing data type and correctly implemented.
  • The new optimizer in Dask is now default and optimized for performance.
  • Some features like column projection and predicate pushdown are still up to date.
  • Dask is better than Spark in many cases.
  • The new shuffle algorithm in Dask is more efficient.
  • The performance of Dask can be improved with contributions from the community.
  • Dask can be used for tasks like argument mining, legal technology, and natural language processing.
  • Dask has a wide range of applications and can be used for various use cases.
  • Dask can be used for large scale data processing and can handle big data.