Comparative Study of Machine Learning Algorithms for Breast Cancer Diagnosis: A Clinician–Engineer Collaborative Approach
Abstract
Breast cancer is the most common cancer and the second leading cause of cancer-related deaths among women globally. The early and precise diagnosis of malignant breast tumours is beneficial for increasing the survival rate of cancer patients [1]. In the current investigation, we propose a multidisciplinary clinician–engineer collaboration to demonstrate the potential of ML in the diagnosis of breast cancer. A publicly available dataset consisting of 569 fine-needle aspirate samples (212 malignant, 357 benign) [2] and 30 quantitative cytological measures was employed to train and evaluate four classification models: Logistic Regression, Random Forest, Support Vector Machine (SVM), and Gradient Boosting. The data were randomly divided into 70% training and 30% testing with standard normalisation. Performance of models was evaluated based on accuracy, sensitivity, specificity and F1-score. SVM had the best of all the highest accuracy (96.5%) at a sensitivity of 93.7% and a specificity of 98.1%, which performed slightly better than the other models. For clinical applications, the high sensitivity means the model won’t miss many cancers, and the high specificity reduces false alarms. The analysis of feature importance identified that cell size and shape-based features (e.g., “worst” radius, perimeter, and area features) contributed the most to the prediction of malignancy, consistent with known pathology guidelines. We debate the clinical impact of such ML tools and potential pitfalls (dataset bias, absence of external validation), as well as future directions such as prospective validation and image integration. In conclusion, according to our results, ML models are reliable classifiers to differentiate benign from malignant breast cytological lesions and point to a promising prospect to complement the clinical decision-making in oncology.
Not yet translated