An In-Depth Analysis of Automatic Speech Recognition System
Аннотация
Recently broad steps have been made in the field of automatic speech recognition by applying the hidden Markov model, and different end-to-end deep learning models. The automatic speech recognition system is a conversion of human speech into the text or control signal by the means of smart methods and performs an essential part in multiple biometric authentication systems and voice-controlled automation systems. In this study, we review the benefits and limitations of the hidden Markov model-based design and end-to-end approaches, which are connectionist temporal classification-based, recurrent neural network-transducer, and attention-based. Their respective advantages and disadvantages and the possible future improvement of the end-to-end approaches are finally pointed out. We reported every factor that potentially influences the accuracy of automatic voice recognition. As a result, we believe this paper will serve as a suitable starting point for academics interested in automatic speech recognition.