We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Chang She - LanceDB: lightweight billion-scale vector search for multimodal AI | PyData Global 2023
Learn how LanceDB enables billion-scale vector search for multimodal AI with fast performance, GPU acceleration, and native integration with the Arrow ecosystem.
-
LanceDB is an open-source, in-process vector database optimized for billion-scale vector search and multimodal AI applications
-
Key technical advantages:
- Uses Lance columnar format optimized for fast random access
- 100x faster performance vs Parquet/ORC for AI workloads
- GPU acceleration for indexing
- Zero-copy schema evolution
- Native integration with Arrow ecosystem (Pandas, Polars, DuckDB)
-
Production-ready features:
- Lightweight transactions
- Versioning and rollbacks
- Time travel capabilities
- Concurrent writes
- Secondary indices
- Separation of compute and storage
-
Flexible deployment options:
- Can run directly in application process
- Supports S3, EBS, EFS storage
- Self-hosted on Kubernetes/VMs
- Cloud version in development
-
Optimized for multimodal data:
- Images, videos, text, point clouds
- Multiple vector columns
- Rich metadata filtering
- Hybrid vector and full-text search
- Built-in model registry for embeddings
-
Cost-effective solution compared to other vector databases:
- Open source reduces licensing costs
- Single node architecture simplifies operations
- Direct S3 integration for cost-optimized storage
- Easy migration with 2-line conversion from existing formats