Machine learning and automatic selection of attributes for the identification of Chagas disease from clinical and sociodemographic data
Keywords:Machine learning; Neural network; Chagas disease.
Objective: evaluate the potential use of machine learning and the automatic selection of attributes in discrimination of individuals with and without Chagas disease based on clinical and sociodemographic data. Method: After the evaluation of many learning algorithms, they have been chosen and the comparison between neural network Multilayer Perceptron (MLP) and the Linear Regression (LR) was done, seeking which one presents the best performance for prediction of the Chagas disease diagnosis, being used the criteria of sensitivity, specificity, accuracy and area under the ROC curve (AUC). Generated models were also compared, using the methods of automatic selection of attributes: Forward Selection, Backward Elimination and genetic algorithm. Results: The best results were achieved using the genetic algorithm and the MLP presented accuracy of 95.95%, 78.30% sensitivity, and specificity of 75.00% and AUC of 0.861. Conclusion: It was proved to be a very interesting performance, given the nature of the data used for sorting and use in public health, glimpsing its relevance in the medical field, enabling an approximation of prevalence that justifies the actions of active search of individuals Chagas disease patients for treatment and prevention.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems 2(4), 303–314.
Dao, S. D., Abhary, K. & Marian, K. (2017). An Innovative Framework for Designing Genetic Algorithm Structures. Expert Systems with Applications 90, 196-208.
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845.
Esfandiari, N., Babavalian, M. R., Moghadam, A. E., Tabar, V. K. (2014). Knowledge discovery in medicine: Current issue and future trend. Expert Systems with Applications 41, 4434-4463.
Faceli, K., Lorena, A. C., Gama, J. & Carvalho, A. C. P. L. F. (2015). Artificial intelligence: a machine learning approach. LTC, 1st edition. (Inteligência artificial: uma abordagem de aprendizado de máquina. LTC.
Forsyth, C. J., Granados, P. S., Pacheco, G. J., Betancourt, J. A., & Meymandi, S. K. Current gaps and needs to increase access to health care for people with Chagas disease in the USA, Curr Trop Med Rep. 2019; 6 (1): 13–22. (Lacunas e necessidades atuais para aumentar o acesso aos cuidados de saúde para pessoas com doença de Chagas nos EUA. Curr Trop Med Rep. 2019; 6 (1): 13–22).
Hornik, K., Stinchcombe, M. & White, H. (1989). Multilayer Feedforward Networks are Universal Approximators. Neural Networks 2(5), 359-366.
Gunter SM, Murray KO, Gorchakov R, Beddard R, Rossmann SN, Montgomery SP, et al. Probably autochthonous transmission of Trypanosoma cruzi to humans, south central Texas, USA. Emerg Infect Dis. 2017; 23 (3): 500–3. (Transmissão provavelmente autóctone de Trypanosoma cruzi para humanos, centro sul do Texas, EUA. Emerg Infect Dis. 23 (3): 500–3).
Guyon, I. & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research 3, 1157–1182.
Ishibuchi, H. & Nojima, Y. (2013). Repeated double cross-validation for choosing a single solution in evolutionary multi-objective fuzzy classifier design. Knowledge-Based Systems 54, 22-31.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Appears in the international joint Conference on artificial intelligence 14, 1137–1145.
Kurt, I., Ture, M. & Kurum, A. T. (2008). Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Systems with Applications 34, 366-374.
Levy, D. S., Cristovão, P. W. & Gabbi, S. (2003). Dynamic swallowing study protocol by videofluoroscopy. (Protocolo do estudo dinâmico da deglutição por videofluoroscopia). In: Jacobi JS, Levy DS, Silva LMC. Dysphagia: evaluation and treatment (Disfagia: avaliação e tratamento). 134-52.
Martínez-Torres, M. R. (2013). Application of evolutionary computation techniques for the identification of innovators in open innovation communities. Expert Systems with Applications 40, 2503-2510.
Mazza. S. (2011). Consensus on Chagas-Mazza Disease. Argentine Journal of Cardiology 79 (6). (Consenso de Enfermedad de Chagas-Mazza. Revista Argentina de Cardiología 79(6).
Moncayo Á, Silveira AC. Atual epidemiological trends of Chagas disease in Latin America and future challenges: Epidemiology, surveillance and health policies. Doença de Chagas gives American trypanosomia: Elsevier; 2017. p. 59–88. (Tendências epidemiológicas atuais da doença de Chagas na América Latina e desafios futuros: Epidemiologia, vigilância e políticas de saúde. Doença de Chagas da tripanossomíase americana: Elsevier; 2017. p. 59–88).
Mitchell, T. M. (1997). Machine Learning, ed. McGraw Hill.
Neto, J. S., Carvalho, J. L. A., Rocha, A. F., Junior, L. F. J. & Nascimento, F. A. O. (2013). Support system for the diagnosis of Chagas disease based on scales and self-organizing neural networks. Brazilian Journal of Biomedical Engineering 29 (3), 242-253. (Sistema de apoio ao diagnóstico da Doença de Chagas baseado em escalogramas e redes neurais auto-organizáveis. Revista Brasileira de Engenharia Biomédica 29(3), 242-253).
Shoostari, S. J. & Gholamalifard, M. (2015). Scenario-based land cover change modeling and its implications for landscape pattern analysis in the Neka Watershed, Iran. Remote Sensing Applications: Society and Environment 1, 1-19.
Silva, R.G. (2004). Oropharyngeal dysphagia after stroke. (Disfagia orofaríngea pós-acidente vascular encefálico). In: Ferreira LP, Befi-Lopes DM, Limongi SCO. Speech therapy treaty 2, 354-356. (Tratado de fonoaudiologia 2, 354-356).
Silva, R. A., Rodrigues, V. L. C. C., Junior, W. A. P. & Pauliquevis Junior, C. (2003). Chagas Disease Control Program (PCDCh): Compliance with triatomine notifications without a fixed term in the Administrative Region of Araçatuba, State of São Paulo, Brazil. Baiana Magazine of Public Health 27 (2): 253-262. (Programa de Controle da Doença de Chagas (PCDCh): Atendimento as notificações triatomínicas sem prazo determinado na Região Administrativa de Araçatuba, Estado de São Paulo, Brasil. Revista Baiana de Saúde Pública 27(2): 253-262).
Spatti, Danilo Hernane, Ivan Nunes da Silva and Rog´erio Andrade Flauzino. Artificial Neural Networks for Engineering and Applied Sciences. Theoretical Foundations and Practical Aspects. Artliber, São Paulo, SP, 2nd. edition, 2016. (Redes Neurais Artificiais Para Engenharia e Ciências Aplicadas. Fundamentos Teóricos e Aspectos Práticos. Artliber, São Paulo, SP, 2nd. edition, 2016).
Tang, T. & Chi, L. (2005). Predicting multilateral trade credit risks: comparisons of Logit and Fuzzy Logic models using ROC curve analysis. Expert Systems with Applications 28, 547-556.
Teles, W. S., Silva, M. H. S., Santana, K. W. C., Madi, R. R., Jeraldo, V. L. S. & Melo, C. M. (2014). Infantile chagas disease in a rural area in northeastern Brazil: risk of transmission and social reflections. Interfaces Scientific Magazine – Humans and Socials 3 (1), 9–18. (Doença de chagas infantil em área rural do nordeste brasileiro: risco de transmissão e reflexões sociais. Revista Interfaces Científicas - Humanas e Sociais 3(1), 9–18).
Traore, B. B., Kamsu-Foguem, B., Tangara, F. (2016). Data mining techniques on satellite images for discovery of risk areas. Expert Systems With Applications 72, 443-456.
Upadhyaya, S., Farahmand, K. & Baker-Demaray, T. (2013). Comparison of NN and LR classifiers in the context of screening native American elders with diabetes. Expert Systems with Applications 40, 5830-5838.
World Health Organization – WHO. (2020). Chagas disease (American trypanosomiasis). http://www.who.int/chagas/epidemiology/en/.
Yang, L., Liu, S., Tsoka, S., Papageorgiou, L. G. (2016). Mathematical programming for piecewise linear regression analysis. Expert Systems with Applications 44, 156-167.
How to Cite
Copyright (c) 2021 Weber de Santana Teles; Aydano Pamponet Machado; Paulo Celso Curvelo Cantos Júnior; Cláudia Moura de Melo; Maria Hozana Santos Silva; Rute Nascimento da Silva; Veronica de Lourdes Sierpe Jeraldo
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.