Generating Stopword List for Sanskrit Language
Аннотация
In the era of information burst, optimization of processes for Information Retrieval, Text Summarization, Text and Data Analytic systems becomes utmost important. Therefore in order to achieve accuracy, redundant words with low or no semantic meaning must be filtered out. Such words are known as Stopwords. Stopwords list has been developed for languages like English, Chinese, Arabic, Hindi, etc but standard stopword list is still missing for Sanskrit language. Identifying stop words manually from Sanskrit text is a herculean task hence this paper reflects an automated stop word generator algorithm based on frequency of word and its implementation to ease the task. To fine-tune the generated list still manual intervention by language expert is required thus following a hybrid approach. The paper presents the first of its kind, a list of seventy-five generic stopwords of Sanskrit language extracted from a data amounting to nearly seventy-six thousand words.
Перевод пока недоступен