The Use of Neural Networks to Improve the Recognition Accuracy of Explosive and Unvoiced Phonemes in Uzbek Language
Annotatsiya
Currently, speech recognition systems are becoming more widespread, especially in those applications where speech dialogue is the most convenient mean of information control and exchange with technical facilities. Obtaining an effective voice control system is currently an important task, requiring the development of methods to obtain high recognition accuracy of voice commands. Under these conditions, along with reducing the noise effect on the quality of recognition, the task is to increase the accuracy in voice control system operation, to increase the likelihood of correct command recognition under stationary interference. Another requirement in the voice control system is the correct recognition speed, since the system must work in real time. In this paper, we propose an algorithm for voice control by technical facilities, based on Uzbek language. Some sounds in Uzbek language (explosive and unvoiced consonants) differ strongly from the sounds in other languages; when creating control algorithms, the recognition accuracy does not meet the requirements. Therefore, to ensure the necessary processing speed and maintain the required accuracy, it is proposed to introduce an additional normalization with a decrease in feature space. The algorithm is based on the principle of primary separation of speech signal spectral characteristics. Further, the signal spectrum is normalized and its resolution increases due to the use of low-frequency conversion and logarithms. The obtained cepstral coefficients are fed to the input of a previously learned neural network.