Structure of The Uzbek-Turkish Parallel Corpus: Linguistic and Extralinguistic Annotation
Аннотация
In this article, the structure of the Uzbek-Turkish parallel corpus, its main components, and the processes of linguistic and extralinguistic tagging of the corpus are analyzed on a scientific basis. The following sections delineate the stages involved in the creation of a parallel corpus, including text selection, adaptation, definition of inter-segment compatibility, and enrichment of the segments with morphological, syntactic, semantic, and contextual tags. It also underscores the significance of extralinguistic tagging in the documentation of the social, methodological, and pragmatic characteristics of the text. Furthermore, it explores the application of linguistic tagging in morphological and syntactic analysis, along with its role in research and neural translation systems.