Popular Pretrained Models for ASR

Pretrained models are typically trained on diverse datasets, making their performance less optimized for specific domain tasks. In such cases, adapting the model to the complications of the target domain through fine-tuning becomes crucial for achieving task-specific proficiency.

Model

Description

Whisper

Whisper ASR is an automatic speech recognition system developed by OpenAI. It utilizes a Seq2Seq model with a combination of convolutional and recurrent neural network layers.

Listen, Attend, and Spell (LAS)

LAS is a Seq2Seq model with an attention mechanism designed for automatic speech recognition. It has been used successfully in various ASR applications.

Conformer

Conformer is an attention-based sequence-to-sequence model that combines convolutional and transformer layers. It has shown strong performance in various ASR tasks.

DeepSpeech

DeepSpeech, developed by Baidu Research, is an end-to-end automatic speech recognition system based on deep learning. It uses a Seq2Seq model with a Connectionist Temporal Classification (CTC) loss.

Automatic Speech Recognition using Whisper

Automatic Speech Recognition (ASR) can be simplified as artificial intelligence transforming spoken language into text. Its historical journey dates back to a time when developing ASR posed significant challenges. Addressing diverse factors such as variations in voices, accents, background noise, and speech patterns proved to be formidable obstacles.

Similar Reads

How Seq2Seq models do Speech Recognition?

Seq2Seq model consist of two main components encoder and decoder connected through a cross attention mechanism. Here we will discuss how the encoder and decoder with attention mechanism can help us achieve an ASR:...

Popular Pretrained Models for ASR

Pretrained models are typically trained on diverse datasets, making their performance less optimized for specific domain tasks. In such cases, adapting the model to the complications of the target domain through fine-tuning becomes crucial for achieving task-specific proficiency....

Fine-tuning Pretrained ASR

Pretrained models may struggle with background noise, especially if the original training data did not adequately represent the noise patterns encountered in the target application. Fine-tuning allows the model to adapt to specific noise characteristics, ensuring better accuracy in real-world scenarios. Furthermore, bias in the original training data of pretrained models can also pose challenges. Fine-tuning becomes a corrective step, helping to remove biases and ensure the model performs well across diverse demographics and characteristics specific to the target dataset....