Bobur Umurzokov - Build AI-powered data pipeline without vector databases | PyData Global 2023

Learn how to build efficient AI data pipelines without vector databases using real-time indexing. Discover solutions for API limits, costs, latency & security challenges.

Key takeaways
  • Building AI data pipelines without vector databases is possible using real-time indexing and processing

  • Key challenges in LLM applications include:

    • OpenAI API limitations (no SLAs, token restrictions)
    • High costs for processing large documents
    • Latency issues
    • Difficulties with offline testing
    • Security and compliance concerns
  • The Pathway framework offers solutions for:

    • Real-time data processing and indexing
    • Built-in connectors for various data sources (APIs, PDFs, CSVs)
    • User permission management
    • Streaming data capabilities
    • Integration with existing data pipelines
  • Architecture simplification is achieved by:

    • Eliminating vector databases
    • Real-time prompt engineering
    • Direct indexing of vector embeddings
    • Streamlined data processing pipeline
  • Benefits include:

    • Lower operational costs
    • Faster time to market
    • Simplified development process
    • Real-time alerting capabilities
    • Easy integration with existing infrastructure
  • Practical applications demonstrated:

    • Expense report summarization from Dropbox files
    • Real-time discount monitoring
    • Security information detection
    • Employee performance tracking
  • The system supports both batch and streaming modes with easy switching between them