Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Статья

Generating Stopword List for Sanskrit Language

Jaideepsinh K. RauljiAhmedabad University, Ahmedabad, Gujarat, IndiaJatinderkumar R. Saini
2017en
ABI

Аннотация

In the era of information burst, optimization of processes for Information Retrieval, Text Summarization, Text and Data Analytic systems becomes utmost important. Therefore in order to achieve accuracy, redundant words with low or no semantic meaning must be filtered out. Such words are known as Stopwords. Stopwords list has been developed for languages like English, Chinese, Arabic, Hindi, etc but standard stopword list is still missing for Sanskrit language. Identifying stop words manually from Sanskrit text is a herculean task hence this paper reflects an automated stop word generator algorithm based on frequency of word and its implementation to ease the task. To fine-tune the generated list still manual intervention by language expert is required thus following a hybrid approach. The paper presents the first of its kind, a list of seventy-five generic stopwords of Sanskrit language extracted from a data amounting to nearly seventy-six thousand words.

Перевод пока недоступен

Идентификаторы

Цитирования и источники

Цитирований: 2Использованных источников: 0