We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Jay Chia - Building Daft: Python + Rust = a better distributed query engine | SciPy 2024
Learn how DAFT combines Python's simplicity with Rust's performance to create a powerful distributed query engine, featuring 2-6x speedups and efficient data processing.
-
DAFT is a Python data frame library written in Rust that combines Python’s ease of use with Rust’s performance benefits for distributed query processing
-
Key advantages of using Rust with Python:
- Avoids Python GIL limitations through Rust multi-threading
- Provides memory stability and efficient resource utilization
- Enables high-performance native code execution while maintaining Python interface
-
DAFT’s architecture:
- Core execution happens in Rust with thin Python wrapper layer
- Uses lazy execution model to optimize query plans
- Leverages Ray for distributed computing capabilities
- Supports multimodal data (tables, images, unstructured data)
-
Performance improvements demonstrated:
- 2-6x speedups by moving computation from Python to Rust
- Efficient multi-threading through Rust while avoiding GIL
- Memory-efficient data handling for large-scale processing
-
Integration approach:
- Simple pip install for Python users
- Incremental adoption possible in existing workflows
- Python-friendly API despite Rust internals
- Works locally on laptop or distributed in cloud
-
Target use cases:
- Analytics and data engineering
- Machine learning data preprocessing
- Large-scale distributed computation
- Processing terabytes to petabytes of data
-
Positioned as alternative to JVM-based engines (Spark) and local tools (pandas, polars) with focus on Python-first experience while leveraging Rust’s performance