Hadi Abdi Khojasteh - Pandas Roadmap and Beyond [PyData Prague #19]

Learn how Pandas is evolving with PyArrow integration, copy-on-write features, and performance improvements. Discover the roadmap for Pandas 3 and recent memory efficiency gains.

Key takeaways
  • Pandas is migrating core functionality to use PyArrow for improved performance and memory efficiency, especially for string operations and data loading

  • Copy-on-write functionality has been added in Pandas 2.2, requiring explicit opt-in, which helps clarify data mutations and reduce memory usage

  • New datetime handling improvements allow better support for different timestamp resolutions beyond just nanoseconds

  • String operations are being modernized to use PyArrow string types by default instead of NumPy object arrays, providing better performance

  • New Pandas Enhancement Proposals (PEDEPs) system established to govern future development and community contributions

  • CSV and JSON readers now dispatch to PyArrow readers, providing up to 10x faster performance

  • Memory footprint reductions of up to 50% possible with PyArrow implementations

  • Warning system for chain operations being updated - will show warnings first, then errors in Pandas 3

  • Community of Pandas contributors growing significantly, with 162 contributors to recent releases

  • In-place operations being removed or limited to improve consistency and reduce confusion around data mutations