Binary logistic regression model applied to data on accidents occurred on federal highways in Brazil
DOI:
https://doi.org/10.33448/rsd-v11i15.36833Keywords:
Supervised analysis; Machine learning; Odds ratio; Lethality of accidents; Highway accidents.Abstract
Accidents on federal highways in Brazil lead to social and economic impacts on the country. Data from the Federal Highway Police reveal that thousands of people lose their lives in these accidents year after year. This paper aims to examine the factors that influence the probability of death based on the occurrence of the accident. The estimation of a binary logistic regression model took place, in which the event of interest is the circumstance of death in an accident with data from 2021. Following variable selection procedures, it was possible to obtain the final model, which was later validated with data from 2022. The accuracy of the model for both 2021 and 2022 data was around 70%. Then, the odds ratio was calculated between some distinct categories, and how much of an increase in accident lethality it generates compared to the reference category. For example, in a crash, a pedestrian is 15.6 times more likely to die when compared to the driver, while a cyclist is 5.3 times more likely to die. Although most accidents have a human cause, some results show the need of public policies that can help reduce these tragedies. To explain the model, a dashboard was created in a way that the user is able to obtain the probability of death by selecting specific accident characteristics and those involved.
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6), 716-723.
Carvalho, M. S. et al. (2011). Análise de sobrevivência: teoria e aplicações em saúde. FIOCRUZ.
Colosimo, E. A. & Giolo, S.R. (2006). Análise de sobrevivência aplicada. Edgard Blucher.
Core Team. (2021). A language and environment for statistical computing. Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org.
Core Team. (2021). Core Team and contributors worldwide stats: The R Stats Package. R package version 4.2.0. 2021.. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html.
CNT. Confederação Nacional do Transporte (2021). Painel CNT de Consultas Dinâmicas dos Acidentes Rodoviários. https://www.cnt.org.br/painel-acidente.
Fávero, L. P. & Belfiore, P. (2017). Manual de análise de dados: estatística e modelagem multivariada com Excel®, SPSS® e Stata®. Elsevier.
Giolo, S. R. (2017). Introdução à Análise de Dados Categóricos com Aplicações. Projeto Fisher ABE.
Izbicki, R. & dos Santos, T. M. (2020). Aprendizado de máquina: uma abordagem estatística. Rafael Izbicki,.
Junior, G. T. B., Bertho, A. C. S. & Veiga, A. C. (2019). A letalidade dos acidentes de trânsito nas rodovias federais brasileiras. Revista Brasileira de Estudos de População, 36, 1-22.
Miranda, R., Silva, W. P. & Dutt-Ross, S. (2021). Identificação de fatores determinantes da severidade das lesões sofridas por pedestres nas rodovias federais brasileiras entre 2017 e 2019: Análise via regressão logística multinomial. Scientia Plena, 17 (4).
McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models. London – New York. Second edition, Chapman and Hall, 1989.
PRF. Polícia Rodoviária Federal. (2021). https://arquivos.prf.gov.br/arquivos/index.php/s/n1T3lymvIdDOzzb.
Roquim, F. V., Nakamura, L. R., Ramires, T. G. & Lima, R. R. (2019). Regressão logística: o que leva um acidente rodoviário a ser uma tragédia? Sigmae, 8 (2), 19-28.
Santos, D. F. (2017). Modelo de regressão log-logístico discreto com fração de cura para dados de sobrevivência. (Dissertação de Mestrado) . Universidade de Brasília, Brasília, Brasil.
Schwarz, G. (1978). Estimating the dimensional of a model. Annals of Statistics, 6, 461-464.
Sugiura, N. (1978). Further analysts of the data by Akaike’s information criterion and the finite corrections: Further analysis of the data by Akaike’s. Communications in Statistics – Theory and Methods, 7 (1), 13-26.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc, 462-482.
WHO. World Health Organization. (2015). Global status report on road safety 2015. https://shortest.link/whointviolenceinjuryprevention.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Damião Flávio dos Santos; Yuri Machado de Souza

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.