Untangling the complexity of diabetes risk: a Bayesian approach to learning causal structures

Authors

  • Ney Michel Lituma Villamar Universidad de Guayaquil, Guayaquil, Ecuador.

DOI:

https://doi.org/10.37711/rpcs.2025.7.3.12

Keywords:

diabetes mellitus, Bayesian networks, artificial intelligence, body mass index, hypertension, glycated hemoglobin A, algorithms, risk factors, prognosis, early diagnosis

Abstract

Objective: To evaluate the performance and interpretability of Bayesian network classifiers for the early detection of diabetes. Methods: A model validation study of machine learning applied to healthcare was conducted, focusing on performance assessment and explainability of algorithms on a categorical and preprocessed dataset. Specifically, the following classifiers were trained and applied: Naive Bayes, Tree Augmented Naive–Chow-Liu (TAN–Chow-Liu), Tree Augmented Naive–Hill Climbing with Super Parents (TAN–HCSP), Fast Super-Parent Search with Joint Mutual Information (FSSJ), and the K-Dependence Bayesian Classifier (KDB). Models were tested on 100,000 preprocessed records (filtered by causal relevance and variable discretization) using bnlearn and bnclassify. Data were partitioned 75/25 (training/testing), and accuracy, sensitivity, specificity, and F1 score were estimated. In addition, the learned structures were analyzed against clinical evidence. Results: All models achieved accuracy >= 0.95 and F1 score > 0.94. FSSJ showed the best performance (accuracy 0.97; specificity 1.00), while Naive Bayes and KDB achieved comparable metrics with lower computational cost. The learned networks reproduced known associations among body mass index (BMI), hypertension, HbA1c, and glucose, and identified indirect chains (e.g., age influencing BMI, BMI influencing glucose, and glucose influencing diabetes), reinforcing their clinical plausibility. Conclusions: Bayesian networks provide transparent, high-quality predictions for diabetes risk. Basic architectures can perform on par with more complex variants when preprocessing is rigorous. The causal pathways highlight modifiable factors (overweight, elevated blood pressure) as priority targets for preventive interventions.

Downloads

Download data is not yet available.

References

Organización Mundial de Salud. Informe Mundial de la Diabetes [Internet]. Ginebra: OMS; 10 de septiembre de 2024 [Consultado 9 de julio de 2025]. Disponible en: https://iris.who.int/bitstream/handle/10665/254649/9789243565255-spa.pdf

Hossain J, Al-Mamun, Islam R. Diabetes mellitus, the fastest growing global public health concern: Early detection should be focused. Health Sci Rep. [Internet]. 22 de marzo de 2024 [Consultado el 9 de julio de 2025];7(3):e2004. doi: 10.1002/hsr2.2004

Bronstein M, Meyer-Kalos P, Vinogradov S, Kummerfeld E. Causal Discovery Analysis: A Promising Tool for Precision Medicine. Psychiatr Ann. [Internet]. 2024 [Consultado el 9 de julio de 2025];54(4):e119-e124. https://doi.org/10.3928/00485713-20240308-01

Montero Rodríguez JC de J, Roshan Biswal R, Sánchez de la Cruz E. Algoritmos de aprendizaje automático de vanguardia para el diagnóstico de enfermedades. Res Comput Sci. [Internet]. 2019 [Consultado el 9 de julio de 2025];148(7):455-68. Disponible en: https://rcs.cic.ipn.mx/2019_148_7/Algoritmos%20de%20aprendizaje%20automatico%20de%20vanguardia%20para%20el%20diagnostico%20de%20enfermedades.pdf

Gómez Ruiz I. Diseño e implementación de modelos de lenguaje para información genómica asociada a enfermedades raras mediante inferencia gramatical [Internet]. Valencia: Universitat Politècnica de València; 2024 [Consultado el 9 de julio de 2025]. Disponible en: https://riunet.upv.es/server/api/core/bitstreams/e752670f-3702-4eee-846b-c16237a5f925/content

Darwiche A. Modeling and Reasoning with Bayesian Networks [Internet]. Cambridge: Cambridge University Press; 2009 [Consultado el 9 de julio de 2025]. Disponible en: https://books.google.co.ve/books?id=7AjXGltje7YC&printsec=frontcover#v=onepage&q&f=false

Hassija V, Chamola V, Mahapatra A, Singal A, Goel D, Huang K, et al. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn Comput. [Internet]. 2024 [Consultado el 9 de julio de 2025];16:45-74. doi: 10.1007/s12559-023-10179-8

Lucas PJ, Van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artif Intell Med. [Internet]. 2004 [Consultado el 9 de julio de 2025];30(3):201-14. doi: 10.1016/j.artmed.2003.11.001

Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. [Internet]. Cambridge: MIT Press; 2009 [Consultado el 9 de julio de 2025]. Disponible en: https://www.researchgate.net/publication/220690050_Probabilistic_Graphical_Models_Principles_and_Techniques

Pearl J. Causality: Models, Reasoning and Inference. [Internet]. 2ª ed. Cambridge: Cambridge University Press; 2009 [Consultado el 9 de julio de 2025]. Disponible en: https://dl.acm.org/doi/book/10.5555/1642718

Suo X, Huang X, Zhong L, Luo Q, Ding L, Xue F. Development and Validation of a Bayesian Network-Based Model for Predicting Coronary Heart Disease Risk From Electronic Health Records. J Am Heart Assoc. [Internet]. 2 de junio de 2024 [Consultado el 9 de julio de 2025];13(1):e029400. doi: 10.1161/JAHA.123.029400

Coaquira-Flores EE, Torres-Cruz F, Condori-Quispe SJ, Tisnado-Puma JC, Melgarejo-Bolivar RP, Herrera-Urtiaga AP, et al. Predicción de diabetes en mujeres mediante un modelo probabilístico basado en redes bayesianas. Científica Digit. [Internet]. 29 de abril de 2023 [Consultado el 9 de julio de 2025];16:185-201. doi: 10.37885/230412748

Bressan GM, Flamia de Azevedo BC, Molina de Souza R. Métodos de classificação automática para predição do perfil clínico de pacientes portadores do diabetes mellitus. Braz J Biometrics. [Internet]. 29 de junio de 2020 [Consultado el 9 de julio de 2025];38(2):257-73. https://doi.org/10.28951/rbb.v38i2.445

Ndjaboue R, Ngueta G, Rochefort-Brihay C, Delorme S, Guay D, Ivers N, et al. Prediction models of diabetes complications: a scoping review. J Epidemiol Community Health [Internet]. 30 de junio de 2022 [Consultado el 9 de julio de 2025];76(10):896-904. doi: 10.1136/jech-2021-217793

Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman C, Sakr S. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLoS One [Internet]. 2017 [Consultado el 9 de julio de 2025];12(7):e0179805. doi: 10.1371/journal.pone.0179805

Nejatian S, Parvin H, Faraji E. Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing [Internet]. 7 de febrero de 2018 [Consultado el 9 de julio de 2025];276:55-66. https://doi.org/10.1016/j.neucom.2017.06.082

Praveenkumar KS. Un enfoque híbrido de analítica de big data para predecir diabetes tipo II usando H-SMOTE tree. Adv Nanotechnol Mater Sci Eng Innov. [Internet]. 2024 [Consultado el 9 de julio de 2025];20(S2):606-624. https://doi.org/10.62441/nano-ntp.vi.494

Xu Z, Wang Z. A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier. In: 11th Int Conf Adv Comput Intelligence (ICACI); Guilin, China 2019 Jun 7-9. [Internet]. Guilin, China: Instituto de Ingenieros Eléctricos y Electrónicos: 2019: 278-283 [Consultado el 9 de julio de 2025]. doi:10.1109/ICACI.2019.8778622

Pes B, Lai G. Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study. PeerJ Comput Sci. [Internet]. 2021 [Consultado el 9 de julio de 2025];7:e832. doi: 10.7717/peerj-cs.832

Wang X, Ren J, Ren H, Song W, Qiao Y, Zhao Y, et al. Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta. Sci Rep. [Internet]. 2023 [Consultado el 9 de julio de 2025];13:12718. doi: 10.1038/s41598-023-40036-5

Parrales-Bravo F, Caicedo-Quiroz R, Rodríguez-Larraburu E, Barzola-Monteses J. ACME: A Classification Model for Explaining the Risk of Preeclampsia Based on Bayesian Network Classifiers and a Non-Redundant Feature Selection Approach. Informatics [Internet]. 2024 [Consultado el 9 de julio de 2025];11(2):31. https://doi.org/10.3390/informatics11020031

Kong D, Chen R, Chen Y, Zhao L, Huang R, Luo L, et al. Bayesian network analysis of factors influencing type 2 diabetes, coronary heart disease, and their comorbidities. BMC Public Health. [Internet]. 8 de mayo de 2024 [Consultado el 9 de julio de 2025];24:1267. doi: 10.1186/s12889-024-18737-x

Fuster-Parra P, Yañez AM, López-González A, Aguiló A, Bennasar-Veny M. Identifying risk factors of developing type 2 diabetes from an adult population with initial prediabetes using a Bayesian network. Front Public Health. [Internet]. 2023 [Consultado el 9 de julio de 2025];10:1035025. doi: 10.3389/fpubh.2022.1035025

Sun Y, Lei J, Kosmas P. Exploring Biomarker Relationships in Both Type 1 and Type 2 Diabetes Mellitus Through a Bayesian Network Analysis Approach. arXiv [Preprint]. 2024; arXiv:2406.17090. https://doi.org/10.48550/arXiv.2406.17090

Choksi P. Conjunto de datos clínicos integrales de diabetes (100k filas) [conjunto de datos en Internet]. Kaggle; 2024 [Consultado el 9 de julio de 2025]. Disponible en: https://www.kaggle.com/datasets/priyamchoksi/100000-diabetes-clinical-dataset.

Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. [Internet]. 2019 [Consultado el 9 de julio de 2025];380(14):1347-58. doi: 10.1056/NEJMra1814259

ElSayed N, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. Classification and Diagnosis of Diabetes: Standards of Care in Diabetes-2023. Diabetes Care [Internet]. 2023 [Consultado el 9 de julio de 2025];46(Suppl 1):S19-S40. doi: 10.2337/dc23-S002

International Diabetes Federation. Atlas de la Diabetes de la FID [Internet]. 10ª ed. Bruselas: International Diabetes Federation; 2021. Disponible en: https://diabetesatlas.org/

Beam AL, Kohane IS. Big Data and Machine Learning in Health Care. JAMA. [Internet]. 2018 [Consultado el 9 de julio de 2025];319(13):1317-1318. doi: 10.1001/jama.2017.18391

Zhang J, Zhang Z, Zhang K, Ge X, Sun R, Zhai X. Early detection of type 2 diabetes risk: limitations of current diagnostic criteria. Front Endocrinol (Lausanne). [Internet]. 2023 [Consultado el 9 de julio de 2025];14:1260623. doi: 10.3389/fendo.2023.1260623

Butalia S, Chu LM, Dover DC, Lau D, Yeung RO, Eurich DT, et al. Association Between Hemoglobin A1c and Development of Cardiovascular Disease in Canadian Men and Women Without Diabetes at Baseline: A Population-Based Study of 608 474 Adults. J Am Heart Assoc. [Internet]. 2024 [Consultado el 9 de julio de 2025];13(9):e031095. doi: 10.1161/JAHA.123.031095

Lin H, Xiao N, Lin S, Liu M, Liu GG. Associations of hypertension, diabetes and heart disease risk with body mass index in older Chinese adults: a population-based cohort study. BMJ Open [Internet]. 2024 [Consultado el 9 de julio de 2025];14(7):e083443. doi: 10.1136/bmjopen-2023-083443

Volpe M, Gallo g. Obesity and cardiovascular disease: An executive document on pathophysiological and clinical links promoted by the Italian Society of Cardiovascular Prevention (SIPREC). Front Cardiovasc Med. [Internet]. 2023 [Consultado el 9 de julio de 2025];10:1136340. doi: 10.3389/fcvm.2023.1136340

Ahmad A, Lim LL, Morieri ML, Tam CH, Cheng F, Chikowore T, et al. Precision prognostics for cardiovascular disease in Type 2 diabetes: a systematic review and meta-analysis. Commun Med (Lond). [Internet]. 2024 [Consultado el 9 de julio de 2025];4(1):11. doi: 10.1038/s43856-023-00429-z

Bruemmer D, Singh A. Cardiometabolic Risk: Shifting the Paradigm Toward Comprehensive Assessment JACC Adv. [Internet]. 2023 [Consultado el 9 de julio de 2025];2(18):100868. doi: 10.1016/j.jacadv.2024.100867

Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. [Internet]. 2019 [Consultado el 9 de julio de 2025];366(6464):447-453. doi: 10.1126/science.aax2342

Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit Med. [Internet]. 2020 [Consultado el 9 de julio de 2025];3:81. doi: 10.1038/s41746-020-0288-5

Chen IY, Johansson FD, Sontag D. Why Is My Classifier Discriminatory? Adv Neural Inf Process Syst. [Internet]. 2018 [Consultado el 9 de julio de 2025];31. doi: 10.48550/arXiv.1805.12002

Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. [Internet]. 2019 [Consultado el 9 de julio de 2025];25(9):1337-340. doi:110.1038/s41591-019-0548-6

Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health [Internet]. 2021 [Consultado el 9 de julio de 2025];3(11):e745-e750. doi: 10.1016/S2589-7500(21)00208-9

Food and Drug Administration. Inteligencia artificial y aprendizaje automático en software como dispositivo médico. [Internet]. Estados Unidos: FDA; 2024 [Consultado el 9 de julio de 2025]. Disponible en: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device

The European Commission. Propuesta de Reglamento sobre un Espacio Europeo de Datos Sanitarios [Internet]. Bruselas: COM(2022) 197 final; 2022 [Consultado el 9 de julio de 2025]. Disponible en: https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space

Downloads

Published

2025-07-26

How to Cite

1.
Lituma Villamar NM. Untangling the complexity of diabetes risk: a Bayesian approach to learning causal structures. Rev Peru Cienc Salud [Internet]. 2025 Jul. 26 [cited 2025 Oct. 6];7(3):226-33. Available from: https://revistas.udh.edu.pe/RPCS/article/view/871

Similar Articles

1-10 of 54

You may also start an advanced similarity search for this article.