Akshay Gupta - When is a compiled language like Rust beneficial for Data Scientists? | SciPy 2024

Explore when Rust's performance benefits outweigh its learning curve for data science tasks. Learn key considerations for adopting Rust vs Python and practical hybrid approaches.

Key takeaways
  • Rust shows promising performance improvements (50-200x speedups) for data science workloads, but comes with significant learning curve and maintainability challenges

  • Python remains the best default choice for data scientists due to its ecosystem, flexibility, and low barrier to entry

  • Polars (Python library written in Rust) offers major performance gains without requiring direct Rust knowledge, making it a good middle-ground solution

  • Selective use of Rust for computationally intensive components while keeping the main codebase in Python may be optimal - full rewrites in Rust are usually not justified

  • Distribution and cross-compilation of Rust code presents significant challenges compared to Python

  • Memory safety benefits of Rust (preventing ~70% of vulnerabilities according to Microsoft) are valuable but must be weighed against development time costs

  • Team/org adoption of Rust faces barriers like:

    • Limited number of developers who can maintain the code
    • Longer development cycles
    • Steep learning curve for data scientists
    • Compilation overhead impacting interactive development
  • The Rust compiler provides excellent guidance but compile times impact the interactive development workflow data scientists prefer

  • Benchmarking shows Rust outperforms Numba for recursive calculations, but vectorized operations don’t gain as much benefit

  • Consider the actual business impact of performance improvements - 45 vs 35 minute runtimes may not meaningfully change workflows