Статья

Syntactic Alignment of Simple Sentence in Uzbek-English Parallel Corpora

Adilbek DauletovUldona AbdurahmonovaTashkent State University of Uzbek Language and Literature Named After Alisher Navoi,Department of Computational Linguistics and Digital Technologies,Tashkent,UzbekistanNoila S. MatyakubovaAlfraganus University,Language Teaching Center,Tashkent,UzbekistanKhurshida BakhrievaDilfuza Ganixodjayeva

2026

ABI

Аннотация

In the field of natural language processing (NLP), accurate alignment of text units in a parallel corpus is important for tasks such as machine translation, cross-lingual information retrieval, and bilingual lexicon creation. However, identifying simple sentences and correctly assigning their types poses certain difficulties during the alignment process, especially at the paragraph and sentence levels. This article analyzes the problems encountered in identifying and matching simple sentences and their structural differences. Factors such as syntactic differences, sentence splitting or merging, and language-specific sentence structure increase the complexity of this process. The study considers problematic situations that arise during the alignment process, proposes criteria for determining the type of simple sentences, and describes the aligning process using a rule-based method to increase the accuracy and consistency of aligning, and provides a linguistic database. The results obtained serve to improve multilingual NLP systems by improving the quality of corpus-based language resources.

Темы

Economic and Industrial Development Education, Innovation and Language Studies Language Acquisition and Education

Идентификаторы

DOI: 10.1109/iisec69317.2026.11418482

Цитирования и источники

Цитирований: 0Использованных источников: 8

Показатели — AkademScholar · Скоро