A Technique for Automatic Extraction of Basis Words: A Case Study on “Uzbek Primary School Corpus”
Annotatsiya
Extracting the basis words from Uzbek language texts is one of the most important tasks that facilitate the school student's learning process—this study, mainly selected such words from among the words in the Uzbek language texts, which can be used to express almost all words. Namely, the process has reduced the set of words to such an extent that it is possible to construct other words using these words. A high-frequency detection method was used to detect these basis words. For the investigation, we have collected 35 primary school textbooks for grades 1–4 approved by the Ministry of Preschool and School Education of the Republic of Uzbekistan and named the “Uzbek Primary School Corpus” (UPSC) by the authors. As a result, it was determined that a first-grade student should know 366 basis words, a second-grade student 462, a third-grade student 486, and a fourth-grade student 512 basis words.