Akshay Gupta - When is a compiled language like Rust beneficial for Data Scientists? | SciPy 2024

Akshay Gupta

Explore when Rust's performance benefits outweigh its learning curve for data science tasks. Learn key considerations for adopting Rust vs Python and practical hybrid approaches.

Key takeaways
  • Rust shows promising performance improvements (50-200x speedups) for data science workloads, but comes with significant learning curve and maintainability challenges

  • Python remains the best default choice for data scientists due to its ecosystem, flexibility, and low barrier to entry

  • Polars (Python library written in Rust) offers major performance gains without requiring direct Rust knowledge, making it a good middle-ground solution

  • Selective use of Rust for computationally intensive components while keeping the main codebase in Python may be optimal - full rewrites in Rust are usually not justified

  • Distribution and cross-compilation of Rust code presents significant challenges compared to Python

  • Memory safety benefits of Rust (preventing ~70% of vulnerabilities according to Microsoft) are valuable but must be weighed against development time costs

  • Team/org adoption of Rust faces barriers like:

    • Limited number of developers who can maintain the code
    • Longer development cycles
    • Steep learning curve for data scientists
    • Compilation overhead impacting interactive development
  • The Rust compiler provides excellent guidance but compile times impact the interactive development workflow data scientists prefer

  • Benchmarking shows Rust outperforms Numba for recursive calculations, but vectorized operations don’t gain as much benefit

  • Consider the actual business impact of performance improvements - 45 vs 35 minute runtimes may not meaningfully change workflows