We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Dask DataFrame is fast now - Florian Jetter (Coiled) @ PyData Südwest
Discover how Dask DataFrame has become a fast and robust engine for large-scale data processing, outperforming Spark in many cases, with a new optimizer, efficient shuffle algorithm, and opportunities for community contributions.
- Dask DataFrame is now fast and can compete with other engines like Spark.
- Dask is more robust and can handle large scale data processing.
- The legacy version of Dask was about optimizing data type and correctly implemented.
- The new optimizer in Dask is now default and optimized for performance.
- Some features like column projection and predicate pushdown are still up to date.
- Dask is better than Spark in many cases.
- The new shuffle algorithm in Dask is more efficient.
- The performance of Dask can be improved with contributions from the community.
- Dask can be used for tasks like argument mining, legal technology, and natural language processing.
- Dask has a wide range of applications and can be used for various use cases.
- Dask can be used for large scale data processing and can handle big data.