Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data

Diego Henrique Pegado  Benício; João Carlos  Xavier Junior; Kairon Ramon Sabino de  Paiva; Juliana Dantas de Araújo Santos  Camargo

doi:10.33448/rsd-v11i6.29184

Autores/as

Diego Henrique Pegado Benício Digital Metropolis Institute; Federal University of Rio Grande do Norte https://orcid.org/0000-0003-2750-0083
João Carlos Xavier Junior Digital Metropolis Institute; Federal University of Rio Grande do Norte https://orcid.org/0000-0003-1517-2211
Kairon Ramon Sabino de Paiva Onofre Lopes University Hospital; Federal University of Rio Grande do Norte https://orcid.org/0000-0001-9772-5101
Juliana Dantas de Araújo Santos Camargo Maternity Hospital-School Januario Cicco; Federal University of Rio Grande do Norte https://orcid.org/0000-0001-8692-5706

DOI:

https://doi.org/10.33448/rsd-v11i6.29184

Palabras clave:

Minería de Texto; Procesamiento del Lenguaje Natural; Historia Clínica Electrónica; Anamneses.

Resumen

El registro de los datos de los pacientes en las historias clínicas electrónicas (HPE) por parte de los profesionales sanitarios suele realizarse en campos de texto libre, lo que permite diferentes formas de describir este tipo de información (p. ej., abreviatura, terminología, etc.). En escenarios como este, la recuperación de datos de dicha fuente (texto) mediante consultas SQL (Lenguaje de consulta estructurado) se convierte en un problema inviable. En base a este hecho, presentamos en este artículo una herramienta para extraer datos comprensibles y estandarizados de pacientes a partir de datos no estructurados que aplica técnicas de Minería de Texto y Procesamiento de Lenguaje Natural. Nuestro principal objetivo es realizar un proceso automático de extracción, limpieza y estructuración de datos obtenidos de PEP de gestantes en la maternidad Januário Cicco ubicada en Natal - Brasil. En nuestro análisis que compara los datos recuperados manualmente por profesionales de la salud (p. ej., médicos y enfermeras) y los datos recuperados por nuestra herramienta, se utilizaron 3000 EPR escritos en portugués. Además, aplicamos la prueba estadística de Kruskal-Wallis para evaluar estáticamente los resultados obtenidos entre procesos manuales y automáticos. Finalmente, los resultados estadísticos mostraron que no hubo diferencia estadística entre los procesos de recuperación. En este sentido, los resultados fueron considerablemente prometedores.

Citas

Antons, D., Grünwald, E., Cichy, P. & Salge, T. O. (2020). The application of text mining methods in innovation research: current state, evolution patterns, and development priorities. R&D Management, 50(3), 329-351.

Aramaki, E., Miura, Y., Tonoike, M., Ohkuma, T., Masuichi, H., Waki, K. & Ohe, K. (2010). Extraction of Adverse Drug Effects from Clinical Records. In Proceedings of the 13th World Congress on Medical (MEDINFO 2010) (pp. 739-743). IOS Press.

Cho, H., Choi, W. & Lee, H. (2017). A method for named entity normalization in biomedical articles: application to diseases and plants. BMC Bioinformatics, 18(451), 1-12.

Chu, S. (2002). Information retrieval and health/clinical management. Yearbook of medical informatics, 1, 271–275.

Downs, J., Velupillai, S., George, G., Holden, R., Kikoler, M., Dean, H., Fernandes, A. & Dutta, R. (2018). Detection of suicidality in adolescents with autism spectrum disorders: Developing a natural language processing approach for use in electronic health records. Journal of the American Medical Informatics Association, 641-649.

Ehrentraut, C., Ekholm, M., Tanushi, H., Tiedemann, J. & Dalianis, H. (2018). Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting. Health Informatics Journal, 24(1), 24–42.

Fleuren, W. W. M. & Alkema, W. (2015). Application of text mining in the biomedical domain. Methods, 74, 97–106.

Gomaa, W. & Fahmy, A. (2013). A survey of text similarity approaches. International Journal of Computer Applications, 68, 13–18.

Grechishcheva, S., Efimov, E. & Metsker, O. (2019). Risk markers identification in EHR using natural language processing: hemorrhagic and ischemic stroke cases. Procedia Computer Science, 156, 142–149.

Guan, J., Li, R., Yu, S., & Zhang, X. (2018). Generation of synthetic electronic medical record text, In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 374–380.

Guida, G. & Mauri, G. (1986). Evaluation of natural language processing systems: Issues and approaches. Proceedings of the IEEE, 74(7), 1026–1035.

Hand, D.J., Smyth, P. & Mannila, H. (2001). Principles of Data Mining. MIT Press, Cambridge, MA, USA.

Hearst, A. (1999). Untangling text data mining, In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. ACL ’99, 3–10, USA: Association for Computational Linguistics.

Leaman, R., Khare, R. & Lu, Z. (2015). Challenges in clinical natural language processing for automated disorder normalization. Journal of Biomedical Informatics, 57, 28–37.

Leonardo, B. & Hansun, S. (2017). Text documents plagiarism detection using rabin-karp and jaro-winkler distance algorithms. Indonesian Journal of Electrical Engineering and Computer Science, 5(2), 462–471.

Li, B. & Han, L. (2013). Distance weighted cosine similarity measure for text classification. Intelligent Data Engineering and Automated Learning, 8206, 611–618.

Luo, G., Huang, X., Lin, C.Y.& Nie, Z. (2015). Joint entity recognition and disambiguation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 879–888, Lisbon, Portugal: Association for Computational Linguistics.

Kreimeyer, K., Foster, M., Pandey, A., Arya, N., Halford, G., Jones, S. F., Forshee, R., Walderhaug, M. & Botsis, T. (2017). Natural language processing systems for capturing and standardizing unstructured clinical information: Asystematic review. Journal of Biomedical Informatics, 73, 14–29.

Kruskal, W.H. & Wallis, W.A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621.

Montenegro, C. A. B. & Rezende, J.F. (2014). Fundamental Obstetrics, 13th edition, Gen.

Oghbaie, M. & Mohammadi, Z. M. (2018). Pairwise document similarity measure based on present term set. Journal Big Data, 5(52), 1–23.

Okuda, T., Tanaka, E. & Kasai, T. (1976). A method for the correction of garbled words based on the Levenshtein metric. IEEE Transactions on Computers, C-25(2), 172–178.

Ratinov, L. & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), 147–155, Colorado: Association for Computational Linguistics.

Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S. & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of Biomedical Informatics, 77, 34–49.

Wu, H., Hodgson, K., Dyson, S., Morley, K. I., Ibrahim, Z. M., Iqbal, E., Stewart, R., Dobson, Richard, J.B., & Sudlow, C. (2019). Efficient reuse of natural language processing models for phenotype-mention identification in free-text electronic medical records: A phenotype embedding approach. JMIR Med Inform, 7(4), e14782.

Aplicación de Minería de Texto y Procesamiento de Lenguaje Natural a Registros Médicos Electrónicos para extraer y transformar textos en datos estructurados

Autores/as

DOI:

Palabras clave:

Resumen

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

JOURNAL METRICS

Idioma

Enviar un artículo