We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Jay Chia - Blazing fast I/O of data in the cloud with Daft Dataframes | PyData Global 2023
Discover Daft, a blazing fast I/O library for cloud data processing, featuring native Rust types, efficient metadata pruning, and multithreading capabilities for scalable performance.
- Daft provides blazing fast I/O performance for data in the cloud, leveraging native Rust types and efficient metadata pruning.
- Daft can load data frames from S3, Parquet, JSON, and other sources, and supports filtering, reads, and projections.
- By using Rust’s multithreading capabilities, Daft can scale linearly with the number of cores available.
- Daft’s design allows for efficient processing of small files, and it can read 10,000 small CSV files in under 2.5 seconds.
- The library uses intelligent batching and retry policies to optimize read-ahead buffering and minimize network bandwidth usage.
- Daft supports various file formats, including Parquet, CSV, and JSON, and can handle data frames with complex data types.
- The library has been tested on real-world data sets, showing significant performance improvements compared to other libraries.
- Daft’s architecture is designed to be highly parallel and scalable.
- The library is available as a Python package and can be installed via pip.
- Daft has been used in production environments, including at Amazon, and has shown significant performance improvements.