Статья

Towards End-To-End Speech Recognition with Recurrent Neural Networks

Alex GravesGoogle (United Kingdom)Navdeep JaitlyDepartment of Computer Science University of Toronto, Canada#TAB#

2014en

ABI

Аннотация

This paper presents a speech recognition sys-tem that directly transcribes audio data with text, without requiring an intermediate phonetic repre-sentation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Tem-poral Classification objective function. A mod-ification to the objective function is introduced that trains the network to minimise the expec-tation of an arbitrary transcription loss function. This allows a direct optimisation of the word er-ror rate, even in the absence of a lexicon or lan-guage model. The system achieves a word error rate of 27.3 % on the Wall Street Journal corpus with no prior linguistic information, 21.9 % with only a lexicon of allowed words, and 8.2 % with a trigram language model. Combining the network with a baseline system further reduces the error rate to 6.7%. 1.

Перевод пока недоступен

Цитирования и источники

Цитирований: 3Использованных источников: 0

Показатели — AkademScholar