We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
How to Automatic Speech Recognition(ASR)? - VB
Learn how to build Automatic Speech Recognition (ASR) models using encoder-decoder architecture, featuring convolutional and recurrent neural networks, and leveraging Hugging Face and Wave 2.0 libraries for pre-trained models and demos.
- Automatic Speech Recognition (ASR) is a process of converting spoken words into text, considering contextual and phonetic variations.
- Waveform representation of speech is broken down into 10-millisecond snippets, and features such as MFCC are extracted.
- The encoder-decoder architecture is used to process the speech, comprising of convolutional and recurrent neural networks.
- Connectionist temporal classification (CTC) is used to match input and output sequences.
- Model can learn to recognize words and phrases despite variations in pronunciation and dialect.
- Hugging Face and Wave 2.0 are two prominent libraries for ASR, providing pre-trained models and demos.
- The billion-dollar question in ASR is how to reconcile multiple alignments and variations in human speech.
- Applications of ASR include voice assistants, messaging, and transcribing phone calls.
- The speaker invites feedback and questions, offering to share the demo code and links to further information.