Article

Real vs. Fake News Classification with Logistic Regression and TF-IDF Feature Extraction

Gaurav SinghGraphic Era Deemed to be University,Department of Civil Engineering,Dehradun,India,248002Yunuskhujaev KhabibullaMamun University,Department of History,Urgench,UzbekistanHamidbek S. YusupovUrgench State Pedagogical Institute,Department of History,Urgench,UzbekistanElyorbek YuldashovUrgench State University,Department of English Language and Literature,Urgench,UzbekistanEtibor SariyevaUrgench Innovation University,Department of Primary Education and Psychology,Urgench,UzbekistanHarshit PandeyGraphic Era Hill University Bhimtal,Dehradun,Uttarakhand,India,248002

2025

ABI

Abstract

In the study, a set of a combination of real and fake articles is used to give a classifier to identify fake news using the logistic regression tests. Preprocessing of the dataset includes the combination of titles and texts, lowercasing of words, the removal of punctuations and stop words, and TF-IDF vectorization with 5,000 features maximum. The data is further divided into 80-20 train/test and the classifier was trained on the default parameters used by the Logistic Regression, which yields the accuracy of about 99 percent on training and 99.01 percent on test. The classifier does not explicitly specify any batch size or epochs as it is a classical ML algorithm whereas the default solver along with the learning rate parameters has achieved rapid convergence, steady performance and is very effective in detecting fake news.

Topics

Misinformation and Its Impacts Spam and Phishing Detection Media Influence and Politics

Identifiers

DOI: 10.1109/medcom67532.2025.11405285

Citations and references

Cited by 05 references

Metrics — AkademScholar · Coming soon