Асосий контентга ўтиш
AkademIndex

Маҳсулотлар

Ишлаб чиқувчилар учун

AkademBaseтез орадаЭкотизим учун очиқ API
Лотин
Ўзбек
Мақола

Uzbek Public-Sector Text Classification: Naive Bayes, Logistic Regression and SVM Benchmarks

Nilufar AbdurakhmonovaNational University of Uzbekistan Named After Mirzo Ulugbek,Tashkent,UzbekistanNilufar Istamovna AdizovaBukhara State University,Bukhara,UzbekistanDilnoza SobirovaBukhara State University,Bukhara,Uzbekistan
2025
ABI

Аннотация

The authors present an algorithm for classifying Uzbek-language documents in key public sector domains, including economics, legal texts, healthcare, housing and utilities, and energy. The study compares three established linear baselines—multinomial naive Bayes, logistic regression, and linear SVM—within a fixed, source-agnostic evaluation protocol. Texts are collected from government agencies and media outlets, cleaned of template text, normalized for spelling variants and apostrophes, and vectorized using TF-IDF at the word and character levels, including a hybrid representation. A cross-domain analysis reveals systematic confusions between the housing and utilities and energy categories, as well as between the legal and economic sectors, reflecting shared tariff narratives, outage/maintenance reporting, and regulatory language embedded in economic indicators. Error analysis also shows that character n-grams and script normalization are critical for robustness in agglutinative, mixed-orthography environments. In addition, the article also contains necessary information on the Uzbek language, its nature and features that must be taken into account when developing such tools.

Мавзулар

Идентификаторлар

Иқтибослар ва манбалар

Кўрсаткичлар — AkademScholar · Тез орада