Dewey Dunnington - Introducing nanoarrow: the world's tiniest Arrow Implementation | SciPy 2024

Learn about NanoArrow, a minimal Arrow implementation for efficient cross-language data transfer, featuring small footprint & easy C integration. By Dewey Dunnington at SciPy 2024.

Key takeaways
  • NanoArrow is a minimal Arrow implementation designed for efficient data transfer between different languages and systems, using only two core files

  • Key advantages of NanoArrow include:

    • Very small footprint compared to full Arrow C++ implementation
    • No complex dependencies
    • Efficient handling of strings and null values
    • Easy integration into C libraries
  • Primary use cases:

    • Wrapping C libraries that need Arrow functionality
    • Fast data transfer between different languages/runtimes
    • Efficient handling of large string arrays
    • Testing and development of Arrow-based functionality
  • NanoArrow handles data representation through:

    • Separate buffers for nullability
    • Efficient string encoding
    • Buffer protocol compatibility in Python
    • Support for Arrow IPC format
  • Compared to full Arrow implementation:

    • More limited in scope (no nested data structures)
    • Focuses on core data transfer functionality
    • Lighter weight alternative for basic Arrow needs
    • Better suited for embedded systems or minimal dependencies
  • Successfully used in projects like:

    • Snowflake Python connector
    • GeoArrow implementations
    • Testing frameworks
    • Language bindings for C libraries
  • Particularly valuable when:

    • Working with cross-language data transfer
    • Dealing with large string datasets
    • Need for minimal dependency overhead
    • Building Arrow-compatible interfaces