Бошқа

Text classification dataset for Uzbek language

Kuriyozov ElmurodUniversidade da Coruna, CITIC, Grupo LYS, Depto. de Computacion y Tecnologıas de la Informacion, Facultade de InformaticaUlugbek SalaevUrgench State UniversitySanatbek MatlatipovNational University of Uzbekistan named after Mirzo UlugbekGayrat MatlatipovUrgench State University

Zenodo (CERN European Organization for Nuclear Research)repository2023en

ABI

Аннотация

It is collected text data from 9 Uzbek news websites and press portals that included news articles and press releases. These websites were selected to cover various categories such as politics, sports, entertainment, technology, and others. In total, we collected 512,750 articles with over 120 million words accross 15 distinct categories, which provides a large and diverse corpus for text classification. It is worth noting that all the text in the corpus is written in the Latin script. <em>Categories (with the name in Uzbek):</em> Local (Mahalliy) World (Dunyo) Sport (Sport) Society (Jamiyat) Law (Qonunchilik) Tech (Texnologiya) Culture (Madaniyat) Politics (Siyosat) Economics (Iqtisodiyot) Auto (Avto) Health (Salomatlik) Crime (Jinoyat) Photo (Foto) Women (Ayollar) Culinary (Pazandachilik) When you reference this article, please be sure to cite it using the following address: BibTex <pre><code>@inproceedings{Kuriyozov2023TextCD, title={Text classification dataset and analysis for Uzbek language}, author={Elmurod Kuriyozov and Ulugbek Salaev and Sanatbek Matlatipov and Gayrat Matlatipov}, year={2023} } </code></pre> APA: <pre><code>Kuriyozov, E., Salaev, U., Matlatipov, S., & Matlatipov, G. (2023). Text classification dataset and analysis for Uzbek language.</code></pre>

Мавзулар

Educational Technology and Assessment Ideological and Political Education Text and Document Classification Technologies

Идентификаторлар

DOI: 10.5281/zenodo.7677431

Иқтибослар ва манбалар

0 та иқтибос0 та фойдаланилган манба

Кўрсаткичлар — AkademScholar · Тез орада