Automatic classification of products marketed by public agency of Rio Grande do Norte through a committee of classifiers
DOI:
https://doi.org/10.33448/rsd-v11i9.31836Keywords:
Text mining; Electronic Invoices; Data processing; Machine learning.Abstract
The use of text mining techniques has increased considerably in recent years due to the large amount of text information being produced and stored by electronic systems and the need to make this data information for organizations. In this context, the Court of Auditors of the State of Rio Grande do Norte (Tribunal de Contas do Rio Grande do Norte, TCE-RN) receives daily a large amount of electronic invoices containing data of product's purchases that need to be analyzed for the society's benefit. Still, these documents allow free filling, often erroneous, of some data by the sellers who issue the invoices. This way, the documents do not come to follow a pattern and make it possible to carry out analysis in a practical and efficient way through common tools for obtaining and filtering data. Therefore, there is a need for automated processing in order to standardize the data, make them available quickly and enable their use as information for audit purposes. So, this work presents a solution based on text mining and machine learning techniques for the problem of identifying commercialized products in the state of Rio Grande do Norte from the description field of Electronic Invoices as a way to enable the classification of these products into unique products
References
Brasil (2013a). Acódão 1785/2013, de 10 de julho de 2013. Tribunal de Contas da União, Brasília, DF. Recuperado de https://pesquisa.apps.tcu.gov.br/#/documento/acordao-completo/*/KEY%253AACORDAO-COMPLETO-1279889/DTRELEVANCIA%2520desc/0/sinonimos%253Dfalse
Brasil (2013b). Decreto 7.892, de 23 de janeiro de 2013. Regulamenta o Sistema de Registro de Preços previsto no art. 15 da Lei nº 8.666, de 21 de junho de 1993. Diário oficial da República Federativa do Brasil. Poder Executivo, Brasília, DF.
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, p. 108–122.
Chen, T., Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD, São Francisco, CA. Recuperado de https://arxiv.org/pdf/1603.02754.pdf
Koche, J. C. (2011). Fundamentos de metodologia científica. Petrópolis: Vozes. Recuperado de: http://www.adm.ufrpe.br/sites/ww4.deinfo.ufrpe.br/files/Fundamentos_de_Metodologia_Cienti%CC%81fica.pdf
Ministério da Fazenda (2020a). Conceito, uso e obrigatoriedade da nf-e (26 questões). Recuperado de https://www.nfe.fazenda.gov.br/portal/perguntasFrequentes.aspx?tipoConteudo=E4+tmY+ODf4=
Ministério da Fazenda (2020b). Manual de orientação do contribuinte - versão 6.00. Recuperado de https://www.nfe.fazenda.gov.br/portal/listaConteudo. aspx?tipoConteudo=33ol5hhSYZk=
Ministério da Fazenda (2020c). Ncm. Recuperado de https://receita.economia.gov.br/orientacao/aduaneira/classificacao-fiscal-de-mercadorias/ncm
Ministério da Fazenda (2020d). Protocolo icms 42, de 3 de julho de 2009. Recuperado de https://www.confaz.fazenda.gov.br/legislacao/protocolos/2009/pt042_09
Tribunal de Contas do Estado da Paraíba (2020a). Painéis preços. Recuperado de https://sagres.tce.pb.gov.br/paineis-precos/
Tribunal de Contas do Estado da Paraíba (2020b). Preço da hora. Recuperado de https://precodahora.pb.gov.br/
Tribunal de Contas do Estado de Minas Gerais (2020). Banco de preços tcemg. Recuperado de https://bancodepreco.tce.mg.gov.br/
Secretaria de Tributação do Rio Grande do Norte (2020). Nota fiscal eletrônica. Recuperado de http://www.set.rn.gov.br/contentProducao/Aplicacao/SET_ v2/nfe/gerados/inicio.asp
dos Santos, D. S. (2018). Uma plataforma distribuída de mineração de dados para big data: um estudo de caso aplicado à secretaria de tributação do Rio Grande do Norte. Dissertação (Mestrado em Engenharia de Software). Universidade Federal do Rio Grande do Norte, Natal, Brasil.
Faceli, K., Lorena, A. C., Gama, J. & de Carvalho, A. C. P. L. F. (2011). Inteligência Artificial: Uma Abordagem de Aprendizado de Máquina. Barueri, SP: LTC
Gandini, A. (2020). Banco de preços. Recuperado de https://github.com/alexgand/banco-de-precos
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA: MIT Press. Recuperado de http://www.deeplearningbook.org
GS1 (2019). Código EAN 13: entenda o que é, para que serve e como usar. Recuperado de https://blog.gs1br.org/codigo-ean-13-entenda-o-que-e-para-que-serve-e-como-usar/
GS1 (2020). Gtin - número global do item comercial. Recuperado de https://www.gs1br.org/codigos-e-padroes/padroes-de-identificacao/gtin
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: p. 2825–2830.
Reimers, N., Schiller, B., Beck, T., Daxenberger, J., Stab, C., Gurevych, I. (2019) Classification and Clustering of Arguments with Contextualized Word Embeddings. Recuperado de: https://arxiv.org/pdf/1906.09821.pdf
Silva, D. S. (2014). Manual de Orientação: pesquisa de preços. Brasília, DF: Seção de Reprografia e Encadernação - Coordenadoria de Serviços Gerais. Recuperado de https://www.stj.jus.br/static_files/STJ/Licita%C3%A7%C3%B5es%20e%20contas%20p%C3%BAblicas/Manual%20de%20pesquisa%20de%20pre%C3%A7o/manual_de_orientacao_de_pesquisa_de_precos.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Elvis Rafael Ferreira Dias; João Carlos Xavier Júnior
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.