We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Reinventing Speech-to-Text Transcriptions with Go and Whisper - Pratim Bhosale & Sacha Arbonel
Explore the power of Go and Whisper in reimagining speech-to-text transcriptions with a Dockerized application, discussing technical details, code examples, and GitHub repository link.
- Go and Whisper can be used for speech-to-text transcriptions with a Dockerized application.
- Use Go.io WAV to decode WAV files and Go bindings for Whisper to use its C++ library.
- The application YTT is built to transcribe YouTube videos and uses Docker to automate the process.
- Docker is necessary for building and running the application.
- SaryalDB is used as the database for storing transcriptions and can be replaced with any other database.
- Go bindings are required for using C++ libraries in Go programs.
- Whisper.cpp is used because it’s a more performative and inexpensive solution for Go language.
- Tokens have a probability attached to them in Whisper’s model.
- Context is an important aspect of transcription, but this talk doesn’t dive deep into its details.
- The application transcribes videos and saves the transcriptions in the database using Serialql, a data language.
- The Whisper model is trained on 680K hours of supervised data from the web.
- The main function handles the transcription process and returns the transcriptions.
- The presentation slides and the repository link are available on GitHub.
- Docker hub is used for building the Docker image.
- The fourth step involves building the Docker file, which automates the process of building the application.
- Go bindings are required for using functions from other programming languages in Go programs.
- SaryalDB provides a unique record ID that includes the table name and ID, making it easier to store and retrieve data.