Classification of specialty coffees using machine learning techniques

Authors

DOI:

https://doi.org/10.33448/rsd-v10i5.14732

Keywords:

Supervised classification; Classification models; Sensory analysis.

Abstract

Specialty coffees have a big importance in the economic scenario, and its sensory quality is appreciated by the productive sector and by the market. Researches have been constantly carried out in the search for better blends in order to add value and differentiate prices according to the product quality. To accomplish that, new methodologies must be explored, taking into consideration factors that might differentiate the particularities of each consumer and/or product. Thus, this article suggests the use of the machine learning technique in the construction of supervised classification and identification models. In a sensory evaluation test for consumer acceptance using four classes of specialty coffees, applied to four groups of trained and untrained consumers, features such as flavor, body, sweetness and general grade were evaluated. The use of machine learning is viable because it allows the classification and identification of specialty coffees produced in different altitudes and different processing methods.

References

Alpaydin, E. (2010). Introduction to machine learning. Adaptive Computation and machine learning Series. MIT Press.

Amaral, F. (2016). Introdução a ciência de dados: mineração de dados e Big Data. Rio de Janeiro: Alta Books. 320 p.

Benedito, L. Z., Lima, C. M. G., Silva, J. F. da, Cardoso, D. C., Verruck, S., & Pereira, R. G. F. A. (2020). Acceptance of coffee by different consumer profiles using multivariate statistics. Research, Society and Development, 9(6), e102963592. 10.33448/rsd-v9i6.3592.

Borém, F. M., Cirillo, M. A., Alves, A. P. C., Santos, C. M., Liska, G. R., Ramos, M. F., & Lima, R. R. (2019). Coffee sensory quality study based on spatial distribution in the Mantiqueira mountain region of Brazil, Journal of Sensory Studies. e12552. 10.1111/joss.12552

Breiman, L. (1996). Bagging predictors. Machine learning. 24(2):123-140, 10.1023/A:1018054314350

Breiman, L. (2001). Random Forests. Machine learning. 45(1):5-32.

Cleary, J. G., &Trigg L. E. (1995) K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning, 108-114.

Cohen, W. W. (1995). Fast Effective Rule Induction. In: Twelfth International Conference on machine learning, 115-123.

Espezua, S., Villanueva, E., Maciel, C. D., & Carvalho, A. (2015). A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing 149, 767–776, 10.1016/j.neucom.2014.07.057

Fehr, L. C. F., Duarte, A. S. L., Tavares, M., & Reis, E. A. (2012). Análise temporal das variáveis de custos da cultura do café arábica nas principais regiões produtoras do Brasil Custos e Agronegócio Online, v. 8, n. 1 – Jan/Mar.

Figueiredo, L. P., Borém, F. M.; Ribeiro, F. C., Giomo, G. S., Malta, M. R., & Taveira, J. H. S. (2018). Sensory analysis and chemical composition of `bourbon’ coffees cultivated in different environments. COFFEE SCIENCE, 13, 122.

Frank, E., Hall, M., & Pfahringer, B. (2003). Locally Weighted Naive Bayes. In: 19th Conference in Uncertainty in Artificial Intelligence, 249-256.

Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine learning. 32(1):63-76, 10.1023/A:1007421302149

Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In: Thirteenth International Conference on machine learning, San Francisco, 148-156.

Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive Logistic Regression: a Statistical View of Boosting. Stanford University. The Annals of Statistics 2000, 28(2), 337-407.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.

Ho, T. K. (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8):832-844, 10.1109/34.709601

Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine learning. 11:63-91, 10.1023/A:1022631118932

Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 10.1145/502512.502529

Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation. 13(3):637-649, 10.1162/089976601300014493

Kira, K., & Rendell, L. A. (1992). A Practical Approach to Feature Selection. In: Ninth International Workshop on Machine Learning, 249-256, 10.1016/B978-1-55860-247-2.50037-1

Kohavi, R. (1995). The Power of Decision Tables. In: 8th European Conference on machine learning, 174-189, 10.1007/3-540-59286-5_57

Kohavi, R. (1995). Wrappers for Performance Enhancement and Oblivious Decision Graphs. Department of Computer Science, Stanford University.

Lakshmi, D., C. (2015). Comparative Analysis of Random Forest, REP Tree and J48 Classifiers for Credit Risk Prediction. IJCA Proceedings on International Conference on Communication, Computing and Information Technology. ICCCMIT 2014(3):30-36.

Landwehr, N., Hall, M., & Frank, E. (2006). Logistic Model Trees. Kluwer Academic Publishers. Printed in the Netherlands.

Liska, G. R., Menezes, F. S., Cirillo, M. A., Borem, F. M., Cortez, R. M., & Ribeiro, D. E. (2015). Evaluation of sensory panels of consumers of specialty coffee beverages using the boosting method in discriminant analysis. Semina. Ciências Agrárias (Online), 36, 3671-3679, 10.5433/1679-0359.2015v36n6p3671

Martinez, W. L., & Martinez, A. R. (2007). Computational Statistics Handbook with MATLAB, (2th. ed.), Chapman & Hall/CRC, 794 p.

Mitchell, T. M. (1997). Machine learning, Mc-Graw Hill, 421p.

Neves, A. das, Okada, H., & Shitsuka, R. (2019). Recognition in Images Using Neural Networks. Research, Society and Development, 8(11), e278111470. 10.33448/rsd-v8i11.1470.

Nicoleli, M., & Moller, H. D. (2006). Análise da competitividade dos custos do café orgânico sombreado irrigado. Custos e Agronegócio Online, 2(1).

Nicoletti, M. C. (2005). O modelo de aprendizado de máquina baseado em exemplares: principais características e algoritmos. EdUFSCar, 61 p.

Oliver, J. J., & Hand, D. (1994). Averaging over decision stumps. Lecture Notes in Computer Science, 231–241, 10.1007/3-540-57868-4_61

Ossani, P. C., & Cirillo, M. A. (2020). MVar: Multivariate Analysis. URL <https://cran.r-project.org/web/packages/MVar/index.html>. R package version 2.1.4.

Ossani, P. C., de Souza, D. C., Rossoni, D. F., & Resende, L. V. (2020). Machine learning in classification and identification of nonconventional vegetables. Journal of Food Science, 85: 4194-4200. 10.1111/1750-3841.15514

Ossani, P. C., Rossoni, D. F., Cirillo, M. Â., & Borém, F. M. (2020). Unsupervised classification of specialty coffees in Homogeneous sensory attributes through machine learning. Coffee Science, 15, e151780. 10.25186/cs.v15i.1780

Ossani, P. C., Cirillo, M. A., Borém, F. M., Ribeiro, D. E., & Cortez, R. M. (2017). Qualidade de cafés especiais: uma avaliação sensorial feita com consumidores utilizando a técnica MFACT. Revista Ciência Agronômica, 48(1), 92-100. 10.5935/1806-6690.20170010

Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA.

R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna: Vienna University of Economics and Business, 2020. .

Silveira, A. S., Pinheiro, A. C. T., Ferreira, W. P. M., Silva, L. J., Rufino, J. L. S., & Sakiyama, N. S. (2016). Sensory analysis of specialty coffee from different environmental conditions in the region of Matas de Minas, Minas Gerais, Brazil. Revista Ceres, 63(4), 436-443, 10.1590/0034-737X201663040002

Spers, E. E., Saes, M. S. M., & Souza, M. C. M. (2004). Análise das preferências do consumidor brasileiro de café: um estudo exploratório dos mercados de São Paulo e Belo Horizonte. RAUSP - Revista de Administração da Universidade de São Paulo, 39(1), 53-61.

Taveira, J. H., Borém, F. M., Rosa, S. D. V. F., Ribeiro, D. E., Chaves, A. R. C. S., Ferreira, D. A., Ferreira, I. T., & Ribeiro, R. C. (2011). Aspectos fisiológicos de grãos de café produzidos em ambientes variados da micro região da Serra da Mantiqueira. In: 7º Simpósio de Pesquisa dos Cafés do Brasil, Araxá. Anais, Epamig.

Wolpert, D. H. (1992). Stacked generalization. Neural Networks. 5:241-259, 10.1016/S0893-6080(05)80023-1

Zamora, V. R. O., Cruz, A. F. da S., Andrade, A. R. S. de, Silva, E. G. da, Andrade, E. K. P. de, Silva, J. D. De S., & Silva, E. T. da. (2020). Supervised classification of riparian forest areas of influence in the Goitá and Tapacurá dams through Spring. Research, Society and Development, 9(11), e4829119947. 10.33448/rsd-v9i11.9947.

Downloads

Published

01/05/2021

How to Cite

OSSANI, P. C.; ROSSONI, D. F. .; CIRILLO, M. Ângelo .; BORÉM, F. M. . Classification of specialty coffees using machine learning techniques . Research, Society and Development, [S. l.], v. 10, n. 5, p. e13110514732, 2021. DOI: 10.33448/rsd-v10i5.14732. Disponível em: https://rsdjournal.org/index.php/rsd/article/view/14732. Acesso em: 9 jan. 2025.

Issue

Section

Agrarian and Biological Sciences