Naty Clementi - Ibis + DuckDB geospatial: a match made on Earth | SciPy 2024

Learn how Ibis & DuckDB deliver fast geospatial analysis without RAM limits. Process 20GB+ datasets on your laptop using 50+ formats & 30+ operations with Python-like syntax.

Key takeaways
  • Ibis + DuckDB provides fast geospatial data processing capabilities without being limited by RAM constraints, allowing analysis of large datasets on a laptop

  • DuckDB recently added support to read directly from GeoParquet files and includes built-in geospatial functions like buffer, within, and intersect operations

  • The combination leverages established geospatial libraries (GEOS, GDAL, Proj) rather than reinventing functionality

  • DuckDB’s performance improvements have made operations 3-25x faster in the last three years, enabling processing of large (20GB+) datasets in seconds instead of hours

  • Ibis provides a Python-like interface to work with geospatial data while executing operations through DuckDB, offering familiar DataFrame-style syntax

  • No dedicated server required - DuckDB operates as an in-process analytical database file on your laptop

  • Supports 50+ geospatial data formats and 30+ geospatial operations

  • Integration with visualization tools like Lombard enables easy mapping and plotting of geospatial data

  • Can mix geospatial data with regular tables for complex analytical queries

  • Use cases include location-based services, environmental monitoring, infrastructure planning, and site selection optimization