Асосий контентга ўтиш
AkademIndex

Маҳсулотлар

Ишлаб чиқувчилар учун

AkademBaseЭкотизим учун очиқ API
Мақола

Automatic Topic Detection in Large Text Data in Uzbek using Clustering Methods

Umirova SvetlanaSamarkand State Univ.,Department of Uzbek Linguistics,Samarkand,UzbekistanKholmuhamedov BakhtiyorSamarkand State Univ.,Department of Uzbek Linguistics,Samarkand,UzbekistanKarimov Suyun AmirovichSamarkand State Univ.,Department of Uzbek Linguistics,Samarkand,UzbekistanNarzieva MamuraSamarkand State Univ.,Department of Uzbek Linguistics,Samarkand,Uzbekistan
2024en
ABI

Аннотация

This article presents an approach to automatic topic detection in large volumes of text data in Uzbek using clustering methods. The research aims to develop and apply the “bubble trap” clustering model for effectively segmenting the vector space of text documents into semantic clusters. This model maintains the volume and center position of the cluster unchanged when adding new vectors, ensuring high accuracy and stability of clustering. The methodology includes preprocessing Uzbek text data, vectorizing using term frequency-inverse document frequency (TF-IDF), and clustering based on semantic similarity. Applying this model to Uzbek text data reveals hidden patterns and structures, demonstrating the model's effectiveness in natural language processing and text mining. The study concludes that the “bubble trap” algorithm offers significant improvements in automatic topic detection for the Uzbek language, providing a reliable tool for analyzing large text corpora and opening new research directions in computational linguistics. This new approach addresses the challenges associated with clustering unstructured text data and contributes to the field by offering a scalable and accurate solution for topic detection in weakly structured text corpora.

Ҳали таржима қилинмаган

Мавзулар

Идентификаторлар

Иқтибослар ва манбалар