Popular Pretrained Models for ASR
Pretrained models are typically trained on diverse datasets, making their performance less optimized for specific domain tasks. In such cases, adapting the model to the complications of the target domain through fine-tuning becomes crucial for achieving task-specific proficiency.
Model |
Description |
---|---|
Whisper |
Whisper ASR is an automatic speech recognition system developed by OpenAI. It utilizes a Seq2Seq model with a combination of convolutional and recurrent neural network layers. |
Listen, Attend, and Spell (LAS) |
LAS is a Seq2Seq model with an attention mechanism designed for automatic speech recognition. It has been used successfully in various ASR applications. |
Conformer |
Conformer is an attention-based sequence-to-sequence model that combines convolutional and transformer layers. It has shown strong performance in various ASR tasks. |
DeepSpeech |
DeepSpeech, developed by Baidu Research, is an end-to-end automatic speech recognition system based on deep learning. It uses a Seq2Seq model with a Connectionist Temporal Classification (CTC) loss. |
Automatic Speech Recognition using Whisper
Automatic Speech Recognition (ASR) can be simplified as artificial intelligence transforming spoken language into text. Its historical journey dates back to a time when developing ASR posed significant challenges. Addressing diverse factors such as variations in voices, accents, background noise, and speech patterns proved to be formidable obstacles.