Using Vector Databases for Multimodal Embeddings and Search - Zain Hasan - NDC London 2024

Explore the potential of vector databases for multimodal embeddings and search, enabling language models to operate on images, audio, and video, and uncover the scalability and applications of this innovative technology.

Key takeaways
  • Language models don’t work with multimedia data, but that’s a limitation that can be overcome with vector databases.
  • Vector databases can store multimodal data, including images, audio, and video, as well as text.
  • The database uses a machine learning model to encode the data and generate vectors for each object.
  • The vectors are then used for querying and retrieval, allowing for semantic search capabilities.
  • Vector databases can be used for retrieving multimodal data, such as images and audio files, based on queries.
  • The database can also be used for tasks such as multimodal reasoning, where it can use multiple modalities to answer a question.
  • The scalability of vector databases is a critical consideration, as they can handle large amounts of data and perform well even at scale.
  • The speaker used a particular vector database in their application, but the concept is applicable to any database.
  • The future of vector databases is promising, with the potential for applications in areas such as multimodal recommender systems.
  • The speaker also highlighted the importance of multimodal models, which can learn from multiple sources of data and apply that knowledge to a variety of tasks.
  • Some of the challenges with multimodal models include scaling them to handle large amounts of data and ensuring they can handle data from different modalities.
  • The speaker also touched on the idea of multimodal search, where a search query can include multiple modalities (e.g. text, images, audio) and the database returns results that match the query across multiple modalities.
  • The speaker also mentioned the concept of multimodal reasoning, where a database can use multiple modalities to answer a question and return a response that incorporates multiple types of data.