Skip to main content
Article

Exploring the Impact of Dietary Habits and Physical Activity on Obesity Rates Using Apache Spark

I UllahBalochistan University of Information Technology Engineering and Management Science,Department of Software Engineering,Quetta,PakistanUzair Aslam BhattiHainan University,School of Information and Communication Engineering,Haikou,ChinaNazeer AhmedBalochistan University of Information Technology Engineering and Management Science,Department of Software Engineering,Quetta,PakistanBo XuHainan University,School of Information and Communication Engineering,Haikou,ChinaValisher SapayevOdilbek UgluMamun University,General Professional Science Department,UzbekistanMoudasra ShahreenNUST Balochistan campus,Department of Computer Science,Quetta,Pakistan
2025
ABI

Abstract

Obesity is a major modern health issue in the United States, and it is a multifaceted problem that is caused by the interplay of lifestyle, socio-economic, and demographic factors. In this paper, a massive analysis of the survey of the National Institute of Diabetes and Digestive and Kidney Diseases titled Nutrition, Physical Activity, and Obesity was conducted using Apache Spark. The data includes the food habits, exercise intensity and self-reported obesity indicators among different population groups. The relationship between variables was analyzed using the distributed computing and MLib framework that is offered by Spark and linear regression and random forest regression models. Results indicate that higher intake of fruit and vegetables is not strongly correlated with lower prevalence of obesity, whereas socio-economic variables such as income and education level were strongly correlated with physical activity and consequently the prevalence of obesity. Such findings suggest the multifactorial nature of obesity and mean that the most effective interventions in the context of obesity-oriented public health interventions should extend past dietary recommendations and encompass more socio-economic disparities. This study can be used to analyze the complex trends of epidemiological data and provide evidence-based policy-making solutions by showing the usefulness of Apache Spark in processing and modeling massive health data.

Topics

Identifiers

Citations and references

Cited by 015 references