ABSTRACT
According to the World Health Organization, obesity is defined as a BMI exceeding 30 which is considered a global epidemic with significant public health risks linked to severe acute conditions such as diabetes, cardiovascular diseases, and cancers. Traditional diagnostic tools such as BMI calculator often fails to consider important factors such as physical activities, dietary habits, and genetic predispositions. This paper adopts machine learning (ML) approach to improve the prediction of obesity with the utilization of dataset from Mexico, Peru, and Colombia, incorporating critical characteristics such as lifestyle, physical conditions, and eating habits. The proposed methodology incorporates efficient pre-processing such as SMOTE to address data imbalance, while feature scaling is applied to handle data normalization, and finally, feature selection methods such as Boruta, Recursive Feature Elimination, and LASSO are optimized to achieve enhanced performance. This study optimized eight different machine learning classifiers such as K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Bagging, Stacking, and Voting, Logistic Regression (LR), Decision Trees (DT), and AdaBoost using GridSearchCV to fine-tune the hyperparameters. These models are evaluated across different evaluation metrics such as accuracy, recall, precision, ROC-AUC, and F1-score. The results obtained show that LR and Bagging constantly outperformed the other classifiers, achieving 93.97% and 93.13% accuracy respectively when feature selection was not utilized, whereas stacking ensemble classifier demonstrates high efficiency with optimized feature selection. Conversely, AdaBoost underperforms across all metrics, due to its sensitivity to dataset characteristics and feature reduction methods. This paper highlights the necessity of optimizing feature selection in improving the overall performance of the ML models, as reported by the results obtained with Boruta, RFE, and LASSO. In conclusion, this paper ascertains the capability of machine learning in promoting obesity level prediction by incorporating several data characteristics and feature selection optimization. This paper contributes meaningful insights into the adoption of ML models to mitigate the increasing obesity related health challenges, offering an alternative for more accurate and efficient health interventions.
References
[1] J. Upadhyay, O. Farr, N. Perakakis, W. Ghaly, and C. Mantzoros, “Obesity as a disease,” Med. Clinics, vol. 102, no. 1, pp. 13–33, 2018.
[2] C. B. Weir and A. Jan, “BMI classification percentile and cut off points,” 2019.
[3] R. R. Roy and G. S. Mala, “Early Detection of Pancreatic Cancer Using Jaundiced Eye Images,” Comput. Syst. Sci. Eng., vol. 41, no. 2, 2022.
[4] V. D. A. Kumar, C. Swarup, I. Murugan, A. Kumar, K. U. Singh, T. Singh, R. Dubey, and others, “Prediction of cardiovascular disease using machine learning technique—A modern approach,” Comput. Mater. Continua, vol. 71, no. 1, pp. 855–869, 2022.
[5] A. Chatterjee, M. W. Gerdes, and S. G. Martinez, “Identification of risk factors associated with obesity and overweight—a machine learning overview,” Sensors, vol. 20, no. 9, p. 2734, 2020.
[6] T. Lung, S. Jan, E. J. Tan, A. Killedar, and A. Hayes, “Impact of overweight, obesity and severe obesity on life expectancy of Australian adults,” Int. J. Obes., vol. 43, no. 4, pp. 782–789, 2019.
[7] I. Gutin, “In BMI we trust: reframing the body mass index as a measure of health,” Social Theory & Health: STH, vol. 16, no. 3, p. 256, 2018.
[8] J. P. Santisteban Quiroz, “Estimation of obesity levels based on dietary habits and condition physical using computational intelligence,” Informatics Med. Unlocked, vol. 29, 2022.
[9] H. G. Gozukara Bag, F. H. Yagin, Y. Gormez, P. P. González, C. Colak, M. Gülü, G. Badicu, and L. P. Ardigò, “Estimation of obesity levels through the proposed predictive approach based on physical activity and nutritional habits,” Diagnostics, vol. 13, no. 18, p. 2949, 2023.
[10] S. A. Alsareii, M. Awais, A. M. Alamri, M. Y. AlAsmari, M. Irfan, M. Raza, and U. Manzoor, “Machine-Learning-Enabled Obesity Level Prediction Through Electronic Health Records,” Comput. Syst. Sci. Eng., vol. 46, no. 3, pp. 3715–3728, 2023.
[11] M. Kıvrak, “Deep learning-based prediction of obesity levels according to eating habits and physical condition,” J. Cogn. Syst., vol. 6, no. 1, pp. 24–27, 2021.
[12] R. C. Cervantes and U. M. Palacio, “Estimation of obesity levels based on computational intelligence,” Informatics Med. Unlocked, vol. 21, p. 100472, 2020.
[13] F. Ferdowsy, K. S. A. Rahi, M. I. Jabiullah, and M. T. Habib, “A machine learning approach for obesity risk prediction,” Curr. Res. Behav. Sci., vol. 2, p. 100053, 2021.
Download all article in PDF