Dewey Dunnington - Introducing nanoarrow: the world's tiniest Arrow Implementation | SciPy 2024

Python

Learn about NanoArrow, a minimal Arrow implementation for efficient cross-language data transfer, featuring small footprint & easy C integration. By Dewey Dunnington at SciPy 2024.

Key takeaways

NanoArrow is a minimal Arrow implementation designed for efficient data transfer between different languages and systems, using only two core files
Key advantages of NanoArrow include:
- Very small footprint compared to full Arrow C++ implementation
- No complex dependencies
- Efficient handling of strings and null values
- Easy integration into C libraries
Primary use cases:
- Wrapping C libraries that need Arrow functionality
- Fast data transfer between different languages/runtimes
- Efficient handling of large string arrays
- Testing and development of Arrow-based functionality
NanoArrow handles data representation through:
- Separate buffers for nullability
- Efficient string encoding
- Buffer protocol compatibility in Python
- Support for Arrow IPC format
Compared to full Arrow implementation:
- More limited in scope (no nested data structures)
- Focuses on core data transfer functionality
- Lighter weight alternative for basic Arrow needs
- Better suited for embedded systems or minimal dependencies
Successfully used in projects like:
- Snowflake Python connector
- GeoArrow implementations
- Testing frameworks
- Language bindings for C libraries
Particularly valuable when:
- Working with cross-language data transfer
- Dealing with large string datasets
- Need for minimal dependency overhead
- Building Arrow-compatible interfaces

Dewey Dunnington - Introducing nanoarrow: the world's tiniest Arrow Implementation | SciPy 2024

More talks