Untangling the complexity of diabetes risk: a Bayesian approach to learning causal structures
DOI:
https://doi.org/10.37711/rpcs.2025.7.3.12Keywords:
diabetes mellitus, Bayesian networks, artificial intelligence, body mass index, hypertension, glycated hemoglobin A, algorithms, risk factors, prognosis, early diagnosisAbstract
Objective: To evaluate the performance and interpretability of Bayesian network classifiers for the early detection of diabetes. Methods: A model validation study of machine learning applied to healthcare was conducted, focusing on performance assessment and explainability of algorithms on a categorical and preprocessed dataset. Specifically, the following classifiers were trained and applied: Naive Bayes, Tree Augmented Naive–Chow-Liu (TAN–Chow-Liu), Tree Augmented Naive–Hill Climbing with Super Parents (TAN–HCSP), Fast Super-Parent Search with Joint Mutual Information (FSSJ), and the K-Dependence Bayesian Classifier (KDB). Models were tested on 100,000 preprocessed records (filtered by causal relevance and variable discretization) using bnlearn and bnclassify. Data were partitioned 75/25 (training/testing), and accuracy, sensitivity, specificity, and F1 score were estimated. In addition, the learned structures were analyzed against clinical evidence. Results: All models achieved accuracy >= 0.95 and F1 score > 0.94. FSSJ showed the best performance (accuracy 0.97; specificity 1.00), while Naive Bayes and KDB achieved comparable metrics with lower computational cost. The learned networks reproduced known associations among body mass index (BMI), hypertension, HbA1c, and glucose, and identified indirect chains (e.g., age influencing BMI, BMI influencing glucose, and glucose influencing diabetes), reinforcing their clinical plausibility. Conclusions: Bayesian networks provide transparent, high-quality predictions for diabetes risk. Basic architectures can perform on par with more complex variants when preprocessing is rigorous. The causal pathways highlight modifiable factors (overweight, elevated blood pressure) as priority targets for preventive interventions.
Downloads
References
Organización Mundial de Salud. Informe Mundial de la Diabetes [Internet]. Ginebra: OMS; 10 de septiembre de 2024 [Consultado 9 de julio de 2025]. Disponible en: https://iris.who.int/bitstream/handle/10665/254649/9789243565255-spa.pdf
Hossain J, Al-Mamun, Islam R. Diabetes mellitus, the fastest growing global public health concern: Early detection should be focused. Health Sci Rep. [Internet]. 22 de marzo de 2024 [Consultado el 9 de julio de 2025];7(3):e2004. doi: 10.1002/hsr2.2004
Bronstein M, Meyer-Kalos P, Vinogradov S, Kummerfeld E. Causal Discovery Analysis: A Promising Tool for Precision Medicine. Psychiatr Ann. [Internet]. 2024 [Consultado el 9 de julio de 2025];54(4):e119-e124. https://doi.org/10.3928/00485713-20240308-01
Montero Rodríguez JC de J, Roshan Biswal R, Sánchez de la Cruz E. Algoritmos de aprendizaje automático de vanguardia para el diagnóstico de enfermedades. Res Comput Sci. [Internet]. 2019 [Consultado el 9 de julio de 2025];148(7):455-68. Disponible en: https://rcs.cic.ipn.mx/2019_148_7/Algoritmos%20de%20aprendizaje%20automatico%20de%20vanguardia%20para%20el%20diagnostico%20de%20enfermedades.pdf
Gómez Ruiz I. Diseño e implementación de modelos de lenguaje para información genómica asociada a enfermedades raras mediante inferencia gramatical [Internet]. Valencia: Universitat Politècnica de València; 2024 [Consultado el 9 de julio de 2025]. Disponible en: https://riunet.upv.es/server/api/core/bitstreams/e752670f-3702-4eee-846b-c16237a5f925/content
Darwiche A. Modeling and Reasoning with Bayesian Networks [Internet]. Cambridge: Cambridge University Press; 2009 [Consultado el 9 de julio de 2025]. Disponible en: https://books.google.co.ve/books?id=7AjXGltje7YC&printsec=frontcover#v=onepage&q&f=false
Hassija V, Chamola V, Mahapatra A, Singal A, Goel D, Huang K, et al. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn Comput. [Internet]. 2024 [Consultado el 9 de julio de 2025];16:45-74. doi: 10.1007/s12559-023-10179-8
Lucas PJ, Van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artif Intell Med. [Internet]. 2004 [Consultado el 9 de julio de 2025];30(3):201-14. doi: 10.1016/j.artmed.2003.11.001
Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. [Internet]. Cambridge: MIT Press; 2009 [Consultado el 9 de julio de 2025]. Disponible en: https://www.researchgate.net/publication/220690050_Probabilistic_Graphical_Models_Principles_and_Techniques
Pearl J. Causality: Models, Reasoning and Inference. [Internet]. 2ª ed. Cambridge: Cambridge University Press; 2009 [Consultado el 9 de julio de 2025]. Disponible en: https://dl.acm.org/doi/book/10.5555/1642718
Suo X, Huang X, Zhong L, Luo Q, Ding L, Xue F. Development and Validation of a Bayesian Network-Based Model for Predicting Coronary Heart Disease Risk From Electronic Health Records. J Am Heart Assoc. [Internet]. 2 de junio de 2024 [Consultado el 9 de julio de 2025];13(1):e029400. doi: 10.1161/JAHA.123.029400
Coaquira-Flores EE, Torres-Cruz F, Condori-Quispe SJ, Tisnado-Puma JC, Melgarejo-Bolivar RP, Herrera-Urtiaga AP, et al. Predicción de diabetes en mujeres mediante un modelo probabilístico basado en redes bayesianas. Científica Digit. [Internet]. 29 de abril de 2023 [Consultado el 9 de julio de 2025];16:185-201. doi: 10.37885/230412748
Bressan GM, Flamia de Azevedo BC, Molina de Souza R. Métodos de classificação automática para predição do perfil clínico de pacientes portadores do diabetes mellitus. Braz J Biometrics. [Internet]. 29 de junio de 2020 [Consultado el 9 de julio de 2025];38(2):257-73. https://doi.org/10.28951/rbb.v38i2.445
Ndjaboue R, Ngueta G, Rochefort-Brihay C, Delorme S, Guay D, Ivers N, et al. Prediction models of diabetes complications: a scoping review. J Epidemiol Community Health [Internet]. 30 de junio de 2022 [Consultado el 9 de julio de 2025];76(10):896-904. doi: 10.1136/jech-2021-217793
Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman C, Sakr S. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLoS One [Internet]. 2017 [Consultado el 9 de julio de 2025];12(7):e0179805. doi: 10.1371/journal.pone.0179805
Nejatian S, Parvin H, Faraji E. Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification. Neurocomputing [Internet]. 7 de febrero de 2018 [Consultado el 9 de julio de 2025];276:55-66. https://doi.org/10.1016/j.neucom.2017.06.082
Praveenkumar KS. Un enfoque híbrido de analítica de big data para predecir diabetes tipo II usando H-SMOTE tree. Adv Nanotechnol Mater Sci Eng Innov. [Internet]. 2024 [Consultado el 9 de julio de 2025];20(S2):606-624. https://doi.org/10.62441/nano-ntp.vi.494
Xu Z, Wang Z. A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier. In: 11th Int Conf Adv Comput Intelligence (ICACI); Guilin, China 2019 Jun 7-9. [Internet]. Guilin, China: Instituto de Ingenieros Eléctricos y Electrónicos: 2019: 278-283 [Consultado el 9 de julio de 2025]. doi:10.1109/ICACI.2019.8778622
Pes B, Lai G. Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study. PeerJ Comput Sci. [Internet]. 2021 [Consultado el 9 de julio de 2025];7:e832. doi: 10.7717/peerj-cs.832
Wang X, Ren J, Ren H, Song W, Qiao Y, Zhao Y, et al. Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta. Sci Rep. [Internet]. 2023 [Consultado el 9 de julio de 2025];13:12718. doi: 10.1038/s41598-023-40036-5
Parrales-Bravo F, Caicedo-Quiroz R, Rodríguez-Larraburu E, Barzola-Monteses J. ACME: A Classification Model for Explaining the Risk of Preeclampsia Based on Bayesian Network Classifiers and a Non-Redundant Feature Selection Approach. Informatics [Internet]. 2024 [Consultado el 9 de julio de 2025];11(2):31. https://doi.org/10.3390/informatics11020031
Kong D, Chen R, Chen Y, Zhao L, Huang R, Luo L, et al. Bayesian network analysis of factors influencing type 2 diabetes, coronary heart disease, and their comorbidities. BMC Public Health. [Internet]. 8 de mayo de 2024 [Consultado el 9 de julio de 2025];24:1267. doi: 10.1186/s12889-024-18737-x
Fuster-Parra P, Yañez AM, López-González A, Aguiló A, Bennasar-Veny M. Identifying risk factors of developing type 2 diabetes from an adult population with initial prediabetes using a Bayesian network. Front Public Health. [Internet]. 2023 [Consultado el 9 de julio de 2025];10:1035025. doi: 10.3389/fpubh.2022.1035025
Sun Y, Lei J, Kosmas P. Exploring Biomarker Relationships in Both Type 1 and Type 2 Diabetes Mellitus Through a Bayesian Network Analysis Approach. arXiv [Preprint]. 2024; arXiv:2406.17090. https://doi.org/10.48550/arXiv.2406.17090
Choksi P. Conjunto de datos clínicos integrales de diabetes (100k filas) [conjunto de datos en Internet]. Kaggle; 2024 [Consultado el 9 de julio de 2025]. Disponible en: https://www.kaggle.com/datasets/priyamchoksi/100000-diabetes-clinical-dataset.
Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. [Internet]. 2019 [Consultado el 9 de julio de 2025];380(14):1347-58. doi: 10.1056/NEJMra1814259
ElSayed N, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. Classification and Diagnosis of Diabetes: Standards of Care in Diabetes-2023. Diabetes Care [Internet]. 2023 [Consultado el 9 de julio de 2025];46(Suppl 1):S19-S40. doi: 10.2337/dc23-S002
International Diabetes Federation. Atlas de la Diabetes de la FID [Internet]. 10ª ed. Bruselas: International Diabetes Federation; 2021. Disponible en: https://diabetesatlas.org/
Beam AL, Kohane IS. Big Data and Machine Learning in Health Care. JAMA. [Internet]. 2018 [Consultado el 9 de julio de 2025];319(13):1317-1318. doi: 10.1001/jama.2017.18391
Zhang J, Zhang Z, Zhang K, Ge X, Sun R, Zhai X. Early detection of type 2 diabetes risk: limitations of current diagnostic criteria. Front Endocrinol (Lausanne). [Internet]. 2023 [Consultado el 9 de julio de 2025];14:1260623. doi: 10.3389/fendo.2023.1260623
Butalia S, Chu LM, Dover DC, Lau D, Yeung RO, Eurich DT, et al. Association Between Hemoglobin A1c and Development of Cardiovascular Disease in Canadian Men and Women Without Diabetes at Baseline: A Population-Based Study of 608 474 Adults. J Am Heart Assoc. [Internet]. 2024 [Consultado el 9 de julio de 2025];13(9):e031095. doi: 10.1161/JAHA.123.031095
Lin H, Xiao N, Lin S, Liu M, Liu GG. Associations of hypertension, diabetes and heart disease risk with body mass index in older Chinese adults: a population-based cohort study. BMJ Open [Internet]. 2024 [Consultado el 9 de julio de 2025];14(7):e083443. doi: 10.1136/bmjopen-2023-083443
Volpe M, Gallo g. Obesity and cardiovascular disease: An executive document on pathophysiological and clinical links promoted by the Italian Society of Cardiovascular Prevention (SIPREC). Front Cardiovasc Med. [Internet]. 2023 [Consultado el 9 de julio de 2025];10:1136340. doi: 10.3389/fcvm.2023.1136340
Ahmad A, Lim LL, Morieri ML, Tam CH, Cheng F, Chikowore T, et al. Precision prognostics for cardiovascular disease in Type 2 diabetes: a systematic review and meta-analysis. Commun Med (Lond). [Internet]. 2024 [Consultado el 9 de julio de 2025];4(1):11. doi: 10.1038/s43856-023-00429-z
Bruemmer D, Singh A. Cardiometabolic Risk: Shifting the Paradigm Toward Comprehensive Assessment JACC Adv. [Internet]. 2023 [Consultado el 9 de julio de 2025];2(18):100868. doi: 10.1016/j.jacadv.2024.100867
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. [Internet]. 2019 [Consultado el 9 de julio de 2025];366(6464):447-453. doi: 10.1126/science.aax2342
Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit Med. [Internet]. 2020 [Consultado el 9 de julio de 2025];3:81. doi: 10.1038/s41746-020-0288-5
Chen IY, Johansson FD, Sontag D. Why Is My Classifier Discriminatory? Adv Neural Inf Process Syst. [Internet]. 2018 [Consultado el 9 de julio de 2025];31. doi: 10.48550/arXiv.1805.12002
Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. [Internet]. 2019 [Consultado el 9 de julio de 2025];25(9):1337-340. doi:110.1038/s41591-019-0548-6
Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit Health [Internet]. 2021 [Consultado el 9 de julio de 2025];3(11):e745-e750. doi: 10.1016/S2589-7500(21)00208-9
Food and Drug Administration. Inteligencia artificial y aprendizaje automático en software como dispositivo médico. [Internet]. Estados Unidos: FDA; 2024 [Consultado el 9 de julio de 2025]. Disponible en: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device
The European Commission. Propuesta de Reglamento sobre un Espacio Europeo de Datos Sanitarios [Internet]. Bruselas: COM(2022) 197 final; 2022 [Consultado el 9 de julio de 2025]. Disponible en: https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space

Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ney Michel Lituma Villamar

This work is licensed under a Creative Commons Attribution 4.0 International License.