Статья

Isolated Word Recognition with Audio Derivation and CNN

Jingjing ZhangEE, Shanghai Jiao Tong University, Shanghai, ChinaShuangjiu XiaoEE, Shanghai Jiao Tong University, Shanghai, ChinaHuichao ZhangUM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai, ChinaLan JiangEE, Shanghai Jiao Tong University, Shanghai, China

2017en

ABI

Аннотация

We present a speaker-independent isolated word recognition approach with audio derivation and convolutional neural network(CNN) in this paper. In contrast with traditional sophisticated phonetic-based features extracted from audio, we utilize the spectrogram of audio as training data for convolutional neural network which transforms the isolated word recognition problem into the image recognition problem. Deep learning has high demands of training data, but it will reduce efficiency of the system to make such corpora. We present an audio-level data derivation approach, which makes it possible to obtain high recognition rate with a small number of audio seed data collected. It is achieved by formant perturbation, pitch shifting, time stretching and volume perturbation while maintaining semantic content. The approach presented in this paper reduces seed data amount demand of deep learning in isolated word recognition. Results show that accuracy improvement is significant with derived data and only 7.57%-15.14% of seed data is needed to achieve the same level accuracy.

Перевод пока недоступен

Идентификаторы

DOI: 10.1109/ictai.2017.00060

Цитирования и источники

Цитирований: 4Использованных источников: 0

Показатели — AkademScholar