Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Препринт

TransLiTex:A Parallel Corpus of Translated Literary Texts

Amel FraisseGERIICO - Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 (Université de Lille - Campus Pont de Bois - Bât B1 niveau 1 - BP 60149 - 59653 VILLENEUVE D'ASCQ CEDEX - France)Quoc-Tan TranGERIICO - Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 (Université de Lille - Campus Pont de Bois - Bât B1 niveau 1 - BP 60149 - 59653 VILLENEUVE D'ASCQ CEDEX - France)Ronald JennCECILLE - Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 (Université de Lille - Domaine universitaire Pont de Bois - Bât B, niv 1, BP 60149 - 59653 Villeneuve d'Ascq Cedex - France)Patrick ParoubekLIMSI - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (Université Paris-Sud Bât. 507 - Rue du Belvédère -91405 ORSAY CEDEX - France)Shelley Fisher FishkinStanford University (450 Serra Mall, Stanford, CA 94305-2004 - United States)
2018en
ABI

Аннотация

In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. Using a crowdsourcing approach, we identified and collected 29 translations of Mark Twain's Adventures of Huckleberry Finn published in 23 languages including less-resourced languages. We report on the current status of the corpus, with 5 chapter-aligned translations (English-Dutch, two English-Hungarian, English-Polish and English-Russian). We evaluated the correctness of chapter alignment by computing the percentage of common words between the English version and the translated ones. Results show high percentages that vary between 43% and 64% proving the high correctness of chapter alignment.

Перевод пока недоступен

Цитирования и источники

Цитирований: 2Использованных источников: 0