NLLB-Based Uzbek NMT: Leveraging Multisource Data
Аннотация
Recent advancements in Neural Machine Translation (NMT) have been largely propelled by the development of transformer-based architectures, exemplified by the NLLB (No Language Left Behind) model. This study extends previous work on low-resource languages by fine-tuning the NLLB model to address the specific challenges of the Uzbek language, which continues to have limited representation in global translation models. Leveraging a comprehensive dataset derived from openly available internet sources, this research refines our approach to gathering and utilizing diverse linguistic data. Our results demonstrate marked improvements in translation accuracy and contextual relevance, surpassing previous benchmarks set by BART models. The findings underscore the effectiveness of NLLB in enhancing translation quality for not only Uzbek but potentially other underrepresented languages. This paper contributes to the ongoing discourse in NMT by highlighting the impact of cutting-edge models on linguistic inclusivity and broadens the scope for deploying machine translation in diverse linguistic landscapes globally.
Ҳали таржима қилинмаган