We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Bobur Umurzokov - Build AI-powered data pipeline without vector databases | PyData Global 2023
Learn how to build efficient AI data pipelines without vector databases using real-time indexing. Discover solutions for API limits, costs, latency & security challenges.
-
Building AI data pipelines without vector databases is possible using real-time indexing and processing
-
Key challenges in LLM applications include:
- OpenAI API limitations (no SLAs, token restrictions)
- High costs for processing large documents
- Latency issues
- Difficulties with offline testing
- Security and compliance concerns
-
The Pathway framework offers solutions for:
- Real-time data processing and indexing
- Built-in connectors for various data sources (APIs, PDFs, CSVs)
- User permission management
- Streaming data capabilities
- Integration with existing data pipelines
-
Architecture simplification is achieved by:
- Eliminating vector databases
- Real-time prompt engineering
- Direct indexing of vector embeddings
- Streamlined data processing pipeline
-
Benefits include:
- Lower operational costs
- Faster time to market
- Simplified development process
- Real-time alerting capabilities
- Easy integration with existing infrastructure
-
Practical applications demonstrated:
- Expense report summarization from Dropbox files
- Real-time discount monitoring
- Security information detection
- Employee performance tracking
-
The system supports both batch and streaming modes with easy switching between them