Talks - Alex Monahan, Gabor Szarnyas: Python and SQL: Better Together, Powered by DuckDB

Learn how DuckDB brings SQL and Python together, delivering 10-100x performance gains through vectorized processing, seamless data format handling, and memory-smart execution.

Key takeaways
  • DuckDB is an analytical SQL database designed to integrate seamlessly with Python workflows, running in-process rather than client-server

  • Key features include vectorized processing, multi-core utilization, and ability to handle datasets larger than RAM by gracefully degrading to disk-based processing

  • Excels at reading/writing multiple data formats (Parquet, CSV, JSON) directly in place without format conversion, plus integrates with pandas, Arrow, and other Python ecosystem tools

  • Achieves 10-100x better performance compared to traditional solutions for analytical workloads by avoiding network overhead and optimizing for modern CPUs

  • Offers flexible API options including native SQL, pandas-like data frames, Ibis interface, and experimental PySpark compatibility

  • Focuses on single-node performance optimization rather than distributed computing, aiming to delay need for cluster deployment

  • Provides ACID transaction support, crash recovery, and persistent storage while maintaining SQLite-like simplicity of deployment

  • Runs anywhere Python runs - laptops, servers, browsers (via WebAssembly), and edge devices with minimal dependencies

  • MIT licensed open source project with over 2M downloads monthly, created by database experts targeting analytical Python workloads

  • Best suited for data science/analytics workflows rather than high-concurrency transactional use cases