Skip to main content
AkademIndex

Products

For developers

AkademBasesoonOpen API for the ecosystem
Latin
English
Article

NLLB-Based Uzbek NMT: Leveraging Multisource Data

Nilufar AbdurakhmonovaNational University of Uzbekistan,Dept. of Computational and applied linguistics,Tashkent,UzbekistanAdkham Zokhirov MohirdevMukhammadali SalokhiddinovAnvar NarzullayevNational University of Uzbekistan named after Mirzo Ulugbek,Tashkent,UzbekistanAyrat GatiatullinInstitute of Applied Semiotics,Academy of Sciences of Tatarstan Republic,Kazan,Russia
2024en
ABI

Abstract

Recent advancements in Neural Machine Translation (NMT) have been largely propelled by the development of transformer-based architectures, exemplified by the NLLB (No Language Left Behind) model. This study extends previous work on low-resource languages by fine-tuning the NLLB model to address the specific challenges of the Uzbek language, which continues to have limited representation in global translation models. Leveraging a comprehensive dataset derived from openly available internet sources, this research refines our approach to gathering and utilizing diverse linguistic data. Our results demonstrate marked improvements in translation accuracy and contextual relevance, surpassing previous benchmarks set by BART models. The findings underscore the effectiveness of NLLB in enhancing translation quality for not only Uzbek but potentially other underrepresented languages. This paper contributes to the ongoing discourse in NMT by highlighting the impact of cutting-edge models on linguistic inclusivity and broadens the scope for deploying machine translation in diverse linguistic landscapes globally.

Topics

Identifiers

Citations and references

Metrics — AkademScholar · Coming soon