Bobur Umurzokov - Build AI-powered data pipeline without vector databases | PyData Global 2023

Learn how to build efficient AI data pipelines without vector databases using real-time indexing. Discover solutions for API limits, costs, latency & security challenges.

Key takeaways

Building AI data pipelines without vector databases is possible using real-time indexing and processing
Key challenges in LLM applications include:
- OpenAI API limitations (no SLAs, token restrictions)
- High costs for processing large documents
- Latency issues
- Difficulties with offline testing
- Security and compliance concerns
The Pathway framework offers solutions for:
- Real-time data processing and indexing
- Built-in connectors for various data sources (APIs, PDFs, CSVs)
- User permission management
- Streaming data capabilities
- Integration with existing data pipelines
Architecture simplification is achieved by:
- Eliminating vector databases
- Real-time prompt engineering
- Direct indexing of vector embeddings
- Streamlined data processing pipeline
Benefits include:
- Lower operational costs
- Faster time to market
- Simplified development process
- Real-time alerting capabilities
- Easy integration with existing infrastructure
Practical applications demonstrated:
- Expense report summarization from Dropbox files
- Real-time discount monitoring
- Security information detection
- Employee performance tracking
The system supports both batch and streaming modes with easy switching between them

Bobur Umurzokov - Build AI-powered data pipeline without vector databases | PyData Global 2023

More talks