Skip to main content
Article

Advanced machine learning framework for enhancing breast cancer diagnostics through transcriptomic profiling

Mohamed J. SaadhFaculty of Pharmacy, Middle East University, Amman, 11831, JordanHanan Hassan AhmedRadhwan Abdul KareemAhl Al Bayt University, Kerbala, IraqAnupam YadavDepartment of Computer Engineering and Application, GLA University, Mathura, 281406, IndiaSubbulakshmi GanesanDepartment of Chemistry and Biochemistry, School of Sciences, JAIN (Deemed to Be University), Bangalore, Karnataka, IndiaAman ShankhyanCentre for Research Impact and Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, 140401, IndiaGirish Chandra SharmaDepartment of Applied Sciences-Chemistry, NIMS Institute of Engineering and Technology, NIMS University Rajasthan, Jaipur, IndiaK. Satyam NaiduAnvar RakhmatullaevDepartment of Faculty Pediatric Surgery, Tashkent Pediatric Medical Institute, Bogishamol Street 223, 100140, Tashkent, UzbekistanHayder Naji SameerCollage of Pharmacy, National University of Science and Technology, Dhi Qar, 64001, IraqAhmed YaseenZainab H. AthabDepartment of Pharmacy, Al-Zahrawi University College, Karbala, IraqMohaned AdilPharmacy College, Al-Farahidi University, Baghdad, IraqBagher FarhoodDepartment of Medical Physics and Radiology, Faculty of Paramedical Sciences, Kashan University of Medical Sciences, Kashan, Iran. [email protected]
Discover Oncologyjournal2025en
ABI

Abstract

PURPOSE: This study proposes an advanced machine learning (ML) framework for breast cancer diagnostics by integrating transcriptomic profiling with optimized feature selection and classification techniques. MATERIALS AND METHODS: A dataset of 1759 samples (987 breast cancer patients, 772 healthy controls) was analyzed using Recursive Feature Elimination, Boruta, and ElasticNet for feature selection. Dimensionality reduction techniques, including Non-Negative Matrix Factorization (NMF), Autoencoders, and transformer-based embeddings (BioBERT, DNABERT), were applied to enhance model interpretability. Classifiers such as XGBoost, LightGBM, ensemble voting, Multi-Layer Perceptron, and Stacking were trained using grid search and cross-validation. Model evaluation was conducted using accuracy, AUC, MCC, Kappa Score, ROC, and PR curves, with external validation performed on an independent dataset of 175 samples. RESULTS: XGBoost and LightGBM achieved the highest test accuracies (0.91 and 0.90) and AUC values (up to 0.92), particularly with NMF and BioBERT. The ensemble Voting method exhibited the best external accuracy (0.92), confirming its robustness. Transformer-based embeddings and advanced feature selection techniques significantly improved model performance compared to conventional approaches like PCA and Decision Trees. CONCLUSION: The proposed ML framework enhances diagnostic accuracy and interpretability, demonstrating strong generalizability on an external dataset. These findings highlight its potential for precision oncology and personalized breast cancer diagnostics.

Topics

Identifiers

Citations and references

Cited by 072 references
Metrics — AkademScholar · Coming soon