We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Matt Cornillon: How I found my Pokémon cards thanks to Postgres: an AI journey (PGConf.EU 2023)
Explore the journey of using Postgres and a machine learning model to detect Pokémon cards from images, with a demo of the real-life application and discussion on vector storage and manipulation.
- The speaker used a machine learning model to detect Pokémon cards from images and stored the embeddings in Postgres using the PG Vector extension.
- The embeddings were generated using a convolutional neural network (CNN) and cosine distance was used for similarity search.
- The speaker mentioned that the number of dimensions in the generated embedding is determined by the machine learning model, which in this case is 768 dimensions.
- The PG Vector extension allows for storing and manipulating vectors, including inserting, deleting, and updating data.
- The speaker also mentioned that the machine learning model can be improved continuously by reusing the pictures and filling the model with new data.
- The speaker used Hugging Face to generate the embedding and also mentioned that it’s a company offering open-source machine learning models and data sets.
- The speaker used label studio to create a data set for machine learning and mentioned that it’s a tool that enables creating a data set from pictures in a good format.
- The speaker also mentioned that the PG Vector extension offers three different distance methodologies: cosine similarity, L2 distance, and IVF flat.
- The speaker showed a demo of the application and mentioned that it’s a real-life use case and not just a theoretical example.
- The speaker mentioned that the similarity search using PG Vector is exact nearest neighbor search if no indices are used, but approximate nearest neighbor search if indices are used.
- The speaker also mentioned that storing pixels inside Postgres might not be efficient and that the embeddings are a better way to store the data.