Asosiy kontentga oʻtish
AkademIndex

Mahsulotlar

Ishlab chiquvchilar uchun

AkademBaseEkotizim uchun ochiq API
Boshqa

Lists of Karakalpak Stopwords

Jernej VičičUniversity of Primorska, FAMNITKhabibulla MadatovUrgench state universityShukurla BekchanovUrgench state university
ABI

Annotatsiya

The dataset presents 3 lists of stopwords in the Karakalpak language. The lists were constructed using three automatic methods applied to the same corpus. The corpus was constructed by obtaining a source of 23 school textbooks, it was named "Karakalpak School Corpus". The corpus can be re-constructed using the list of urls of all files comprised in the corpus. The list is part of the dataset (list_of_urls_for_karakalpak_school_corpus.txt). Description of the methods and the lists: A set of grammar rules and the TDIDF algorithm were used to automatically collect a list of single-word stopwords. 4014 stopwords were collected. The name of the file: Karakalpak_stopwords_unigrams.txt. A bigram method was used to extract a list of 3740 bigrams (pairs) of stopwords. The name of the file: Karakalpak_stopwords_bigram.txt. A set of two-word collocations as stopwords was also extracted. The list has 20745 pairs of stopwords. The name of the file: Karakalpak_stopwords_bigrams_with_collocations.txt.

Hali tarjima qilinmagan

Mavzular

Identifikatorlar

Iqtiboslar va manbalar

1 ta iqtibos0 ta foydalanilgan manba