Article

NLLB-Based Uzbek NMT: Leveraging Multisource Data

Nilufar AbdurakhmonovaNational University of Uzbekistan,Dept. of Computational and applied linguistics,Tashkent,UzbekistanAdkham Zokhirov MohirdevMukhammadali SalokhiddinovAnvar NarzullayevNational University of Uzbekistan named after Mirzo Ulugbek,Tashkent,UzbekistanAyrat GatiatullinInstitute of Applied Semiotics,Academy of Sciences of Tatarstan Republic,Kazan,Russia

2024en

ABI

Abstract

Recent advancements in Neural Machine Translation (NMT) have been largely propelled by the development of transformer-based architectures, exemplified by the NLLB (No Language Left Behind) model. This study extends previous work on low-resource languages by fine-tuning the NLLB model to address the specific challenges of the Uzbek language, which continues to have limited representation in global translation models. Leveraging a comprehensive dataset derived from openly available internet sources, this research refines our approach to gathering and utilizing diverse linguistic data. Our results demonstrate marked improvements in translation accuracy and contextual relevance, surpassing previous benchmarks set by BART models. The findings underscore the effectiveness of NLLB in enhancing translation quality for not only Uzbek but potentially other underrepresented languages. This paper contributes to the ongoing discourse in NMT by highlighting the impact of cutting-edge models on linguistic inclusivity and broadens the scope for deploying machine translation in diverse linguistic landscapes globally.

Topics

Educational Technology and Assessment Advanced Data Processing Techniques Advanced Computational Techniques and Applications

Identifiers

DOI: 10.1109/ubmk63289.2024.10773423

Citations and references

Cited by 4 3 references

Metrics — AkademScholar · Coming soon