We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
DuckDB: Crunching Data Anywhere, From Laptops to Servers • Gabor Szarnyas • GOTO 2024
Learn how DuckDB enables high-performance data analysis on laptops without configuration. Explore its architecture, key features, and ideal use cases for local data processing.
-
DuckDB is an open-source analytical database system designed to process large datasets (10GB-1TB) on end-user devices like laptops, with zero configuration required
-
Key features include:
- In-process execution (no client-server architecture)
- Column-based storage optimized for analytics
- Vectorized execution using 2,048-item vectors
- Full SQL support with advanced features
- Direct integration with Pandas, R, Python, and other languages
-
Performance advantages come from:
- Zero-copy data access
- Automatic vectorization and SIMD optimization
- Zone maps (min/max indexes) for efficient filtering
- Parallel processing based on row groups
-
Portability is achieved through:
- Pure C++11 codebase with minimal dependencies
- WebAssembly support for browser execution
- Standalone file format requiring no server
-
Business model:
- MIT licensed, source owned by DuckDB Foundation
- DuckDB Labs provides commercial support and consulting
- MotherDuck offers cloud integration services
-
Main limitations:
- No support for multiple concurrent writers
- Single-node execution only
- Not suitable for transactional workloads
- Limited to datasets that fit in memory/disk
-
Primary use cases:
- Local data analysis and ETL
- Reducing cloud costs through local processing
- Educational environments
- Building blocks for larger applications
- Interactive data exploration