Gil Forsyth - Ibis: because SQL is everywhere and so is Python | SciPy 2024

Learn how Ibis bridges Python and SQL, offering a unified DataFrame API across 20+ databases. Explore data efficiently while leveraging native engine capabilities and optimization.

Key takeaways
  • Ibis is a Pythonic dataframe interface that works with 20+ different database engines and query systems, providing a consistent API across them

  • Rather than learning multiple SQL dialects or database-specific interfaces, Ibis allows using a single Python interface while letting the underlying engine handle execution

  • Ibis uses lazy/deferred execution - queries are built up step by step but only executed when results are explicitly requested, allowing query optimization

  • For data exploration, Ibis integrates well with pandas, polars, and other PyData tools while keeping computation close to where the data resides

  • The tool aims to provide good interfaces and performance while not dictating which engine to use - the choice of engine should be based on where data lives and performance needs

  • Ibis handles differences between SQL dialects, data types, and function names across different database systems behind a common interface

  • The project has extensive testing across all supported engines to ensure compatibility and consistent behavior

  • Key integrations include DuckDB, Snowflake, BigQuery, Postgres, pandas, polars, PySpark and support for CSV, Parquet, PyArrow formats

  • For large datasets, Ibis enables keeping computation in the database engine rather than pulling all data locally

  • While SQL knowledge is valuable, Ibis provides a more ergonomic Python interface while still allowing direct SQL access when needed