Stages of Creating an Uzbek-English Parallel Corpus and Principles of Selecting a Linguistic Base
Abstract
This paper is a conceptual study that explores the fundamental stages of creating an Uzbek-English parallel corpus, with special emphasis on the linguistic and methodological principles of selecting the base texts. The study identifies and reviews criteria for the inclusion of texts, such as genre diversity, representativeness, alignment accuracy, and linguistic relevance. Particular attention is given to balancing modern and classical texts, as well as to the role of technological tools in achieving consistent sentence-level alignment. The pipeline and recommendations presented in this paper are based on a synthesis of existing research and are proposed as a guideline for corpus developers aiming to construct a reliable and research-oriented bilingual resource.