Reinventing Speech-to-Text Transcriptions with Go and Whisper - Pratim Bhosale & Sacha Arbonel

Explore the power of Go and Whisper in reimagining speech-to-text transcriptions with a Dockerized application, discussing technical details, code examples, and GitHub repository link.

Key takeaways

Go and Whisper can be used for speech-to-text transcriptions with a Dockerized application.
Use Go.io WAV to decode WAV files and Go bindings for Whisper to use its C++ library.
The application YTT is built to transcribe YouTube videos and uses Docker to automate the process.
Docker is necessary for building and running the application.
SaryalDB is used as the database for storing transcriptions and can be replaced with any other database.
Go bindings are required for using C++ libraries in Go programs.
Whisper.cpp is used because it’s a more performative and inexpensive solution for Go language.
Tokens have a probability attached to them in Whisper’s model.
Context is an important aspect of transcription, but this talk doesn’t dive deep into its details.
The application transcribes videos and saves the transcriptions in the database using Serialql, a data language.
The Whisper model is trained on 680K hours of supervised data from the web.
The main function handles the transcription process and returns the transcriptions.
The presentation slides and the repository link are available on GitHub.
Docker hub is used for building the Docker image.
The fourth step involves building the Docker file, which automates the process of building the application.
Go bindings are required for using functions from other programming languages in Go programs.
SaryalDB provides a unique record ID that includes the table name and ID, making it easier to store and retrieve data.

Reinventing Speech-to-Text Transcriptions with Go and Whisper - Pratim Bhosale & Sacha Arbonel

More talks