Real vs. Fake News Classification with Logistic Regression and TF-IDF Feature Extraction
Abstract
In the study, a set of a combination of real and fake articles is used to give a classifier to identify fake news using the logistic regression tests. Preprocessing of the dataset includes the combination of titles and texts, lowercasing of words, the removal of punctuations and stop words, and TF-IDF vectorization with 5,000 features maximum. The data is further divided into 80-20 train/test and the classifier was trained on the default parameters used by the Logistic Regression, which yields the accuracy of about 99 percent on training and 99.01 percent on test. The classifier does not explicitly specify any batch size or epochs as it is a classical ML algorithm whereas the default solver along with the learning rate parameters has achieved rapid convergence, steady performance and is very effective in detecting fake news.