Article

A hybrid deep learning model for sentiment analysis of COVID-19 tweets with class balancing

Md. Alamin TalukderDepartment of Computer Science and Engineering, International University of Business Agriculture and Technology, Dhaka, Bangladesh. [email protected]Md. Ashraf UddinSchool of Information Technology, Deakin University, Burwood Campus, Melbourne, AustraliaSuman RoyDepartment of Computer Science and Engineering, Jagannath University, Dhaka, 1100, BangladeshPartho GhoseDepartment of Computer Science and Engineering, Jagannath University, Dhaka, 1100, BangladeshSmita SarkerDepartment of Statistics, Hajee Mohammad Danesh Science and Technology University, Dinajpur, 5200, BangladeshAnsam KhraisatSchool of Information Technology, Deakin University, Burwood Campus, Melbourne, AustraliaMohsin KaziDepartment of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box-2457, 11451, Riyadh, Saudi ArabiaMd Momtazur RahmanDepartment of English and Modern Languages, International University of Business Agriculture and Technology, Dhaka, BangladeshMusawer HakimiDepartment of Computer Science, Samangan University, Northeast Aybak, Samangan Province, Afghanistan. [email protected]

Scientific Reportsjournal2025en

ABI

Abstract

The widespread dissemination of misinformation and the diverse public sentiment observed during the COVID-19 pandemic highlight the necessity for accurate sentiment analysis of social media discourse. This study proposes a hybrid deep learning (DL) model that integrates Bidirectional Encoder Representations from Transformers (BERT) for contextual feature extraction with Long Short-Term Memory (LSTM) networks for sequential learning to classify COVID-19-related sentiments. To enhance data quality, advanced text preprocessing techniques, including Unicode normalization, contraction expansion, and emoji conversion, are applied. Additionally, to mitigate class imbalance, Random OverSampling (ROS) is employed, leading to significant improvements in model performance. Before applying ROS, the model exhibited lower accuracy and inconsistent performance across sentiment categories. After balancing the dataset, accuracy for binary classification increased to 92.10%, with corresponding precision, sensitivity, and specificity of 92.10%, 92.10%, and 91.50%, respectively. For three-class sentiment classification, accuracy improved to 89.47%, with precision, sensitivity, and specificity of 89.80%, 89.47%, and 94.10%, respectively. In five-class sentiment classification, accuracy reached 81.78%, with precision, sensitivity, and specificity of 82.19%, 81.78%, and 95.28%, respectively. These findings demonstrate the efficacy of combining deep learning-based sentiment analysis with advanced text preprocessing and class balancing techniques for accurately classifying public sentiment related to COVID-19 across multiple sentiment categories.

Topics

Sentiment Analysis and Opinion Mining Misinformation and Its Impacts Spam and Phishing Detection

Identifiers

DOI: 10.1038/s41598-025-97778-7

Citations and references

Cited by 014 references

Metrics — AkademScholar · Coming soon