Asosiy kontentga oʻtish
AkademIndex

Mahsulotlar

Ishlab chiquvchilar uchun

AkademBaseEkotizim uchun ochiq API
Maqola

Audio augmentation for speech recognition

Tom KoVijayaditya PeddintiDaniel PoveyJohns Hopkins UniversitySanjeev KhudanpurJohns Hopkins University
2015en
ABI

Annotatsiya

Data augmentation is a common strategy adopted to increase the quantity of training data, avoid overfitting and improve robustness of the models. In this paper, we investigate audio-level speech augmentation methods which directly process the raw signal. The method we particularly recommend is to change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1. The proposed technique has a low implementation cost, making it easy to adopt. We present results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios. An average relative improvement of 4.3% was observed across the 4 tasks.

Hali tarjima qilinmagan

Identifikatorlar

Iqtiboslar va manbalar

3 ta iqtibos0 ta foydalanilgan manba