Мақола

Machine Learning-Based Predictive Modelling for Early Diagnosis of Type 2 Diabetes Mellitus: A Comparative Analysis of Supervised Classification Algorithms

Rupanjali SinghIndian Institute of Technology Jodhpur, NH-65, Nagaur Road, Nagaur, Rajasthan 342037, IndiaPriyanka BhandariDepartment of Pharmacology, School of Pharmaceutical Sciences, SGRR University, Patel Nagar, Dehradun, IndiaS MadanAmity Institute of Pharmacy, Amity University, Sector 125, Noida, Uttar Pradesh 201313, IndiaS. K. SaxenaAmity Institute of Pharmacy, Amity University, Sector 125, Noida, Uttar Pradesh 201313, IndiaDr.Nayyar ParvezSchool of Pharmacy, Sharda University, Greater Noida, Uttar Pradesh, IndiaLohitha RajaHistology Biology Department, Fergana Medical Institute of Public Health, Yangi Turon 2A, Fergana 150100, UzbekistanKhaydarova Gulyora Zokirjon kiziDepartment of Folk Medicine and Pharmacology, Fergana Medical Institute of Public Health, Yangi Turon 2A, Fergana 150100, UzbekistanSumitha Vakati

International Journal of Drug Delivery Technologyjournal2026

ABI

Аннотация

Background: Type 2 diabetes mellitus (T2DM) constitutes one of the most rapidly expanding metabolic disorders globally, with projections indicating that the affected population will surpass 783 million individuals by 2045. Timely and accurate prediction of T2DM at the pre-diabetic or early symptomatic stage is essential for reducing morbidity, limiting healthcare expenditure, and enabling targeted preventive interventions. Conventional clinical risk stratification tools often lack sufficient discriminatory power, underscoring the pressing need for robust computational approaches. Objective: The present study aimed to develop, train, and validate multiple supervised machine learning (ML) classification models to predict T2DM incidence using a curated dataset of clinical, biochemical, and lifestyle parameters, and to identify the optimal algorithm for clinical deployment. Methods: A retrospective dataset comprising 10,892 patient records was assembled from the PIMA Indian Diabetes Database supplemented with clinical registry data, encompassing features including fasting plasma glucose, glycated haemoglobin (HbA1c), body mass index (BMI), blood pressure, age, physical activity index, dietary quality score, family history, and socioeconomic indicators. Post-preprocessing entailing missing value imputation, z-score normalization, and onehot encoding seven ML classifiers were trained: Logistic Regression, K-Nearest Neighbours (KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, eXtreme Gradient Boosting (XGBoost), Multilayer Perceptron (MLP), and a Stacking Ensemble. Stratified 10-fold cross-validation was applied, and models were evaluated on Accuracy, Precision, Recall, F1score, Specificity, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC). Results: The Stacking Ensemble model achieved the highest overall performance with accuracy of 91.6%, precision of 90.8%, recall of 89.4%, F1-score of 90.1%, specificity of 92.3%, and AUC-ROC of 0.948. XGBoost performed second best (AUC-ROC = 0.931; accuracy = 89.3%), followed by the Neural Network (MLP; AUC-ROC = 0.922). Glucose concentration and HbA1c emerged as the most predictive features via SHAP-based importance analysis. Conclusion: The proposed ensemble framework demonstrates superior discriminatory capability for early T2DM prediction and offers a scalable, non-invasive adjunct to conventional diagnostic protocols. Integration of such models within electronic health record systems and wearable health-monitoring platforms holds significant promise for population-level diabetes prevention.

Ҳали таржима қилинмаган

Мавзулар

Artificial Intelligence in Healthcare Machine Learning in Healthcare Diabetes, Cardiovascular Risks, and Lipoproteins

Идентификаторлар

DOI: 10.25258/ijddt.16.35s.116

Иқтибослар ва манбалар

0 та иқтибос0 та фойдаланилган манба

Кўрсаткичлар — AkademScholar