We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Jay Chia - Building Daft: Python + Rust = a better distributed query engine | SciPy 2024
Learn how DAFT combines Python's simplicity with Rust's performance to create a powerful distributed query engine, featuring 2-6x speedups and efficient data processing.
- 
    
DAFT is a Python data frame library written in Rust that combines Python’s ease of use with Rust’s performance benefits for distributed query processing
 - 
    
Key advantages of using Rust with Python:
- Avoids Python GIL limitations through Rust multi-threading
 - Provides memory stability and efficient resource utilization
 - Enables high-performance native code execution while maintaining Python interface
 
 - 
    
DAFT’s architecture:
- Core execution happens in Rust with thin Python wrapper layer
 - Uses lazy execution model to optimize query plans
 - Leverages Ray for distributed computing capabilities
 - Supports multimodal data (tables, images, unstructured data)
 
 - 
    
Performance improvements demonstrated:
- 2-6x speedups by moving computation from Python to Rust
 - Efficient multi-threading through Rust while avoiding GIL
 - Memory-efficient data handling for large-scale processing
 
 - 
    
Integration approach:
- Simple pip install for Python users
 - Incremental adoption possible in existing workflows
 - Python-friendly API despite Rust internals
 - Works locally on laptop or distributed in cloud
 
 - 
    
Target use cases:
- Analytics and data engineering
 - Machine learning data preprocessing
 - Large-scale distributed computation
 - Processing terabytes to petabytes of data
 
 - 
    
Positioned as alternative to JVM-based engines (Spark) and local tools (pandas, polars) with focus on Python-first experience while leveraging Rust’s performance