Статья

A Survey of Grapheme-to-Phoneme Conversion Methods

Shiyang ChengSchool of Envronment and Spatial Informatics, China University of Mining and Technology, No.1 Daxue Road, Xuzhou 221000, ChinaPengcheng ZhuSchool of Computer Science and Technology, China University of Mining and Technology, No.1 Daxue Road, Xuzhou 221000, ChinaJueting LiuSchool of Computer Science and Technology, China University of Mining and Technology, No.1 Daxue Road, Xuzhou 221000, ChinaZehua WangDepartment of Electrical and Computer Engineering, University of British Columbia, 2332 Main Mall, Vancouver, BC V6T 1Z4, Canada

2024en

ABI

Аннотация

Grapheme-to-phoneme conversion (G2P) is the task of converting letters (grapheme sequences) into their pronunciations (phoneme sequences). It plays a crucial role in natural language processing, text-to-speech synthesis, and automatic speech recognition systems. This paper provides a systematical overview of the G2P conversion from different perspectives. The conversion methods are first presented in the paper; detailed discussions are conducted on methods based on deep learning technology. For each method, the key ideas, advantages, disadvantages, and representative models are summarized. This paper then mentioned the learning strategies and multilingual G2P conversions. Finally, this paper summarized the commonly used monolingual and multilingual datasets, including Mandarin, Japanese, Arabic, etc. Two tables illustrated the performance of various methods with relative datasets. After making a general overall of G2P conversion, this paper concluded with the current issues and the future directions of deep learning-based G2P conversion.

Перевод пока недоступен

Идентификаторы

DOI: 10.3390/app142411790

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar