Статья

Speech data collection system for Kazakh language

D. KuanyshbayComputer Science department, Suleyman Demirel University, Kaskelen, KazakhstanOlimzhon BaimuratovComputer Science department, Suleyman Demirel University, Kaskelen, KazakhstanYedilkhan AmirgaliyevComputer Science department, Suleyman Demirel University, Kaskelen, KazakhstanArailym KuanyshbayevaDepartment of postgraduate education, Kazakh Ablai Khan University of International Relations and World Languages, Almaty, Kazakhstan

2021en

ABI

Аннотация

Speech data in most of the languages that have a low resource doesn’t even exist. Therefore, producing speech corpora is very challenging and requires tremendous amount of time. Kazakh language due to its lack of popularity considered to be low-resource language. This paper provides an overview on many data collection techniques, marking some of the issues related to a particular method. The main aim of this paper is to present crowd sourcing web-based tool called “Kazakh recorder” which accessible on the website and designed to make the collection of Kazakh speech data more conveniently and quickly. Moreover, this section provides a statistics of people (age, gender, number of sentences) who made a contribution on collecting this speech data. Using this tool, we have collected over 50 hours of speech data 65 different native speakers, each having on average 500 sentences pronounced in Kazakh language.

Перевод пока недоступен

Идентификаторы

DOI: 10.1109/icecco53203.2021.9663771

Цитирования и источники

Цитирований: 4Использованных источников: 0

Показатели — AkademScholar