Classification of specialty coffees using machine learning techniques
DOI:
https://doi.org/10.33448/rsd-v10i5.14732Keywords:
Supervised classification; Classification models; Sensory analysis.Abstract
Specialty coffees have a big importance in the economic scenario, and its sensory quality is appreciated by the productive sector and by the market. Researches have been constantly carried out in the search for better blends in order to add value and differentiate prices according to the product quality. To accomplish that, new methodologies must be explored, taking into consideration factors that might differentiate the particularities of each consumer and/or product. Thus, this article suggests the use of the machine learning technique in the construction of supervised classification and identification models. In a sensory evaluation test for consumer acceptance using four classes of specialty coffees, applied to four groups of trained and untrained consumers, features such as flavor, body, sweetness and general grade were evaluated. The use of machine learning is viable because it allows the classification and identification of specialty coffees produced in different altitudes and different processing methods.
References
Alpaydin, E. (2010). Introduction to machine learning. Adaptive Computation and machine learning Series. MIT Press.
Amaral, F. (2016). Introdução a ciência de dados: mineração de dados e Big Data. Rio de Janeiro: Alta Books. 320 p.
Benedito, L. Z., Lima, C. M. G., Silva, J. F. da, Cardoso, D. C., Verruck, S., & Pereira, R. G. F. A. (2020). Acceptance of coffee by different consumer profiles using multivariate statistics. Research, Society and Development, 9(6), e102963592. 10.33448/rsd-v9i6.3592.
Borém, F. M., Cirillo, M. A., Alves, A. P. C., Santos, C. M., Liska, G. R., Ramos, M. F., & Lima, R. R. (2019). Coffee sensory quality study based on spatial distribution in the Mantiqueira mountain region of Brazil, Journal of Sensory Studies. e12552. 10.1111/joss.12552
Breiman, L. (1996). Bagging predictors. Machine learning. 24(2):123-140, 10.1023/A:1018054314350
Breiman, L. (2001). Random Forests. Machine learning. 45(1):5-32.
Cleary, J. G., &Trigg L. E. (1995) K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning, 108-114.
Cohen, W. W. (1995). Fast Effective Rule Induction. In: Twelfth International Conference on machine learning, 115-123.
Espezua, S., Villanueva, E., Maciel, C. D., & Carvalho, A. (2015). A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing 149, 767–776, 10.1016/j.neucom.2014.07.057
Fehr, L. C. F., Duarte, A. S. L., Tavares, M., & Reis, E. A. (2012). Análise temporal das variáveis de custos da cultura do café arábica nas principais regiões produtoras do Brasil Custos e Agronegócio Online, v. 8, n. 1 – Jan/Mar.
Figueiredo, L. P., Borém, F. M.; Ribeiro, F. C., Giomo, G. S., Malta, M. R., & Taveira, J. H. S. (2018). Sensory analysis and chemical composition of `bourbon’ coffees cultivated in different environments. COFFEE SCIENCE, 13, 122.
Frank, E., Hall, M., & Pfahringer, B. (2003). Locally Weighted Naive Bayes. In: 19th Conference in Uncertainty in Artificial Intelligence, 249-256.
Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine learning. 32(1):63-76, 10.1023/A:1007421302149
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In: Thirteenth International Conference on machine learning, San Francisco, 148-156.
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive Logistic Regression: a Statistical View of Boosting. Stanford University. The Annals of Statistics 2000, 28(2), 337-407.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.
Ho, T. K. (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8):832-844, 10.1109/34.709601
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine learning. 11:63-91, 10.1023/A:1022631118932
Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 10.1145/502512.502529
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation. 13(3):637-649, 10.1162/089976601300014493
Kira, K., & Rendell, L. A. (1992). A Practical Approach to Feature Selection. In: Ninth International Workshop on Machine Learning, 249-256, 10.1016/B978-1-55860-247-2.50037-1
Kohavi, R. (1995). The Power of Decision Tables. In: 8th European Conference on machine learning, 174-189, 10.1007/3-540-59286-5_57
Kohavi, R. (1995). Wrappers for Performance Enhancement and Oblivious Decision Graphs. Department of Computer Science, Stanford University.
Lakshmi, D., C. (2015). Comparative Analysis of Random Forest, REP Tree and J48 Classifiers for Credit Risk Prediction. IJCA Proceedings on International Conference on Communication, Computing and Information Technology. ICCCMIT 2014(3):30-36.
Landwehr, N., Hall, M., & Frank, E. (2006). Logistic Model Trees. Kluwer Academic Publishers. Printed in the Netherlands.
Liska, G. R., Menezes, F. S., Cirillo, M. A., Borem, F. M., Cortez, R. M., & Ribeiro, D. E. (2015). Evaluation of sensory panels of consumers of specialty coffee beverages using the boosting method in discriminant analysis. Semina. Ciências Agrárias (Online), 36, 3671-3679, 10.5433/1679-0359.2015v36n6p3671
Martinez, W. L., & Martinez, A. R. (2007). Computational Statistics Handbook with MATLAB, (2th. ed.), Chapman & Hall/CRC, 794 p.
Mitchell, T. M. (1997). Machine learning, Mc-Graw Hill, 421p.
Neves, A. das, Okada, H., & Shitsuka, R. (2019). Recognition in Images Using Neural Networks. Research, Society and Development, 8(11), e278111470. 10.33448/rsd-v8i11.1470.
Nicoleli, M., & Moller, H. D. (2006). Análise da competitividade dos custos do café orgânico sombreado irrigado. Custos e Agronegócio Online, 2(1).
Nicoletti, M. C. (2005). O modelo de aprendizado de máquina baseado em exemplares: principais características e algoritmos. EdUFSCar, 61 p.
Oliver, J. J., & Hand, D. (1994). Averaging over decision stumps. Lecture Notes in Computer Science, 231–241, 10.1007/3-540-57868-4_61
Ossani, P. C., & Cirillo, M. A. (2020). MVar: Multivariate Analysis. URL <https://cran.r-project.org/web/packages/MVar/index.html>. R package version 2.1.4.
Ossani, P. C., de Souza, D. C., Rossoni, D. F., & Resende, L. V. (2020). Machine learning in classification and identification of nonconventional vegetables. Journal of Food Science, 85: 4194-4200. 10.1111/1750-3841.15514
Ossani, P. C., Rossoni, D. F., Cirillo, M. Â., & Borém, F. M. (2020). Unsupervised classification of specialty coffees in Homogeneous sensory attributes through machine learning. Coffee Science, 15, e151780. 10.25186/cs.v15i.1780
Ossani, P. C., Cirillo, M. A., Borém, F. M., Ribeiro, D. E., & Cortez, R. M. (2017). Qualidade de cafés especiais: uma avaliação sensorial feita com consumidores utilizando a técnica MFACT. Revista Ciência Agronômica, 48(1), 92-100. 10.5935/1806-6690.20170010
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA.
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna: Vienna University of Economics and Business, 2020. .
Silveira, A. S., Pinheiro, A. C. T., Ferreira, W. P. M., Silva, L. J., Rufino, J. L. S., & Sakiyama, N. S. (2016). Sensory analysis of specialty coffee from different environmental conditions in the region of Matas de Minas, Minas Gerais, Brazil. Revista Ceres, 63(4), 436-443, 10.1590/0034-737X201663040002
Spers, E. E., Saes, M. S. M., & Souza, M. C. M. (2004). Análise das preferências do consumidor brasileiro de café: um estudo exploratório dos mercados de São Paulo e Belo Horizonte. RAUSP - Revista de Administração da Universidade de São Paulo, 39(1), 53-61.
Taveira, J. H., Borém, F. M., Rosa, S. D. V. F., Ribeiro, D. E., Chaves, A. R. C. S., Ferreira, D. A., Ferreira, I. T., & Ribeiro, R. C. (2011). Aspectos fisiológicos de grãos de café produzidos em ambientes variados da micro região da Serra da Mantiqueira. In: 7º Simpósio de Pesquisa dos Cafés do Brasil, Araxá. Anais, Epamig.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks. 5:241-259, 10.1016/S0893-6080(05)80023-1
Zamora, V. R. O., Cruz, A. F. da S., Andrade, A. R. S. de, Silva, E. G. da, Andrade, E. K. P. de, Silva, J. D. De S., & Silva, E. T. da. (2020). Supervised classification of riparian forest areas of influence in the Goitá and Tapacurá dams through Spring. Research, Society and Development, 9(11), e4829119947. 10.33448/rsd-v9i11.9947.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Paulo César Ossani; Diogo Francisco Rossoni; Marcelo Ângelo Cirillo; Flávio Meira Borém
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.