Статья

Evaluating automatic syllabification algorithms for English

Yannick MarchandConnie R. AdsettR.I. Damper

2007en

ABI

Аннотация

Automatic syllabification of words is challenging, not least because the syllable is difficult to define precisely. This task is important for word modelling in the composition process of cocatenative synthesis as well as in automatic speech recognition. There are two broad approaches to perform automatic syllabification: rule-based and data-driven. The rule-based method effectively embodies some theoretical position regarding the syllable, whereas the data-driven paradigm infers ‘new’ syllabifications from examples assumed to be correctly-syllabified already. This paper compares the performance of the two basic approaches. However, it is difficult to determine a correct syllabification in all cases and so to establish the quality of the ‘gold standard’ corpus used either to quantitatively evaluate the output of an automatic algorithm or as the example-set on which data-driven methods crucially depend. Thus, three lexical databases of pre-syllabified words were used. Two of these lexicons hold the same 18,016 words with their corresponding syllabifications coming from independent sources, whereas the third corresponds to the 13,594 words that share the same syl-labifications according to these two sources. As well as one rule-based approach (Fisher’s implementation of Kahn’s syl-labification theory), three data-driven techniques are evaluated: a look-up procedure, an exemplar-based generalization tech-nique, and syllabification by analogy (SbA). The results on the three databases show consistent and robust patterns: the data-driven techniques outperform the rule-based system in word and juncture accuracies by a very significant margin and best results are obtained with SbA.

Перевод пока недоступен

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar