Статья

Dimensionality reduction for sentiment analysis using pre-processing techniques

Mayuri MhatreInformation Technology Department, SAKEC, Mumbai, IndiaDakshata PhondekarInformation Technology Department, SAKEC, Mumbai, IndiaPranali KadamInformation Technology Department, SAKEC, Mumbai, IndiaAnushka ChawatheInformation Technology Department, SAKEC, Mumbai, IndiaKranti GhagInformation Technology Department, SAKEC, Mumbai, India

2017en

ABI

Аннотация

Sentiment analysis is the study of people's opinions, sentiments, attitudes and emotions, expressed in written language but this process is time consuming, inconsistent and costly in business context. Pre-processing the data will help to ease this difficulty. Pre-processing is the process of cleaning and preparing the text for its analysis using pre-processing techniques. The existing pre-processing techniques are Handling Expressive Lengthening, Emoticons Handling, HTML Tags Removal, Punctuations Handling, Slangs Handling, Stopwords Removal, Stemming and Lemmatization. In this paper, the effect of various pre-processing techniques and their combinations was analyzed on the dataset taken from Kaggle called Bag of Words Meets Bags of Popcorn. By taking every possible combination of pre-processing techniques, the aim was to find the one giving highest accuracy. Random Forest Classifier was used to predict sentiments as it is known to give good accuracy and the result was evaluated using 10 fold cross validation method. Accuracy increased from unprocessed data to pre-processed data. It was concluded that using pre-processing techniques gives a higher accuracy than the traditional approach i.e. no pre-processing.

Перевод пока недоступен

Идентификаторы

DOI: 10.1109/iccmc.2017.8282676

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar