Automatic Construction of Generic Stop Words List for Hindi Text
Аннотация
In this technological world, Hindi text is speedily increasing on web and attracting many users and researchers to retrieve useful information from this data. In Information Retrieval (IR) process, user comes across many words of least or no semantic importance are called as stop words. Removal of such common words can result in to effective indexing of corpus and enhancement of IR systems performance. To achieve performance of IR systems construction of stop word list becomes the basic requirement of any IR process. Many stop words lists are created for English, Chinese and other European languages, but no such standard list is available for Hindi language over Internet. In order to save user time and overhead of manual picking of stop words; we have tried to implement aggregation model based on social choice theory of election process. In aggregation model, both statistical and information based methods are implemented to construct stop words list for Hindi. Results are shown with the help of tables and graphs.
Перевод пока недоступен