Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseскороОткрытый API экосистемы
Латиница
Русский
Другое

Lists of Karakalpak Stopwords

Jernej VičičUniversity of Primorska, FAMNITKhabibulla MadatovUrgench state universityShukurla BekchanovUrgench state university
ABI

Аннотация

The dataset presents 3 lists of stopwords in the Karakalpak language. The lists were constructed using three automatic methods applied to the same corpus. The corpus was constructed by obtaining a source of 23 school textbooks, it was named "Karakalpak School Corpus". The corpus can be re-constructed using the list of urls of all files comprised in the corpus. The list is part of the dataset (list_of_urls_for_karakalpak_school_corpus.txt). Description of the methods and the lists: A set of grammar rules and the TDIDF algorithm were used to automatically collect a list of single-word stopwords. 4014 stopwords were collected. The name of the file: Karakalpak_stopwords_unigrams.txt. A bigram method was used to extract a list of 3740 bigrams (pairs) of stopwords. The name of the file: Karakalpak_stopwords_bigram.txt. A set of two-word collocations as stopwords was also extracted. The list has 20745 pairs of stopwords. The name of the file: Karakalpak_stopwords_bigrams_with_collocations.txt.

Темы

Идентификаторы

Цитирования и источники

Цитирований: 1Использованных источников: 0
Показатели — AkademScholar · Скоро