Syntactic Alignment of Simple Sentence in Uzbek-English Parallel Corpora
Аннотация
In the field of natural language processing (NLP), accurate alignment of text units in a parallel corpus is important for tasks such as machine translation, cross-lingual information retrieval, and bilingual lexicon creation. However, identifying simple sentences and correctly assigning their types poses certain difficulties during the alignment process, especially at the paragraph and sentence levels. This article analyzes the problems encountered in identifying and matching simple sentences and their structural differences. Factors such as syntactic differences, sentence splitting or merging, and language-specific sentence structure increase the complexity of this process. The study considers problematic situations that arise during the alignment process, proposes criteria for determining the type of simple sentences, and describes the aligning process using a rule-based method to increase the accuracy and consistency of aligning, and provides a linguistic database. The results obtained serve to improve multilingual NLP systems by improving the quality of corpus-based language resources.
Ҳали таржима қилинмаган