Exploratory Data Analysis and Machine Learning for Cardiometabolic Risk Prediction, Stratified by Age and Gender
Аннотация
The modern healthcare analytics depends on machine learning and data visualization to transform complex healthcare information into useful knowledge. This study performs Exploratory Data Analysis (EDA) and predictive modeling on a medical dataset to investigate patterns focused on age and gender of the patients. Moreover, it also explores the risk factors associated with cardiometabolic diseases. Using a Python-based environment, we analyze the relationship between gender and age, which are independent variables. Additionally, this study also analyzes the relationship between dependent variables such as cholesterol, diabetes, chest pain, and blood sugar. To address the limited public dataset availability, we augmented our analysis by incorporating supplementary cardiovascular risk data from regional health surveys, increasing the effective sample size to <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{N}=512$</tex>. Our findings reveal significant differences in the frequency of cardiometabolic risk factors. Specifically, we observed a sharp increase in the rate of high cholesterol risk starting between age groups of 50 and 59 years. Furthermore, we recommend implementing multilevel regression models incorporating age × gender × BMI × blood pressure interactions and Restricted Cubic Splines (RCS) for continuous variables to capture non-linear threshold effects in future studies.
Перевод пока недоступен