Use of multivariate statistics to predict the physicochemical quality of milk

Multivariate analysis involves the application of statistical and computational methods to predict responses. Among the various methods of statistical analysis multivariate, the analysis by main components is highlighted to predict the composition and quality of food in general. The objective of this work was to characterize the milk producers of the municipality of Itapetinga-BA, using principal component analysis. Twenty samples of raw milk were used, collected at the reception of the dairy located in Itapetinga-BA. The variables analyzed were: fat, density, defatted dry extract, protein and lactose. The first two main components explained 87.24% of the total variation. It was verified the formation of different groups distributed in the four quadrants of the system. First quadrant stood out from the others by forming a group composed of ten producers in the analyzed region, characterized by presenting samples with higher lactose content and lower fat content in milk. The lactose and fat variables are of greater importance in the characterization of milk.


Introduction
Multivariate analysis involves the application of statistical and computational methods to predict, reduce, group and classify a set of data of interest. The variables of these data, which must be interrelated, are used simultaneously (Souza et al., 2012). There are several methods to perform the statistical multivariate analysis among which are the analysis by main components and the partial square sums that are widely used in the development of analytical methods for the prediction of composition and quality of food in general (Viana, 2018).
According to Filho et al. (2010), the principal components analysis (PCA) is the chosen option in the analysis of production systems due to its capacity to synthetize large data tables and for indicating the variables responsible for the diversity of production systems.
Milk is a whitish secretion produced by the mammary glands of mammalian females; whose natural function is to feed the chicks at an early age. Some milk constituents are produced in secretory cells and others from blood (Guetouache et al., 2014).
In this sense, lactose is a disaccharide characteristic of milk, composed of glucose and galactose, and its synthesis, which requires α-lactalbumin milk protein, is directly related to the amount of milk produced, as it is responsible for transporting blood water to the mammary glands (Lucey et al., 2017).
Bovine milk contains about 200 different proteins and can be distinguished in two main fractions: caseins (αs1-casein, αs2-casein, β-casein and k-casein), which represent 80% of bovine milk proteins and are arranged in mycelial complexes conferring their milky appearance, and serum proteins (α-lactalbumin, β-lactoglobulin and albumin), which represent 20% of total bovine milk (D'AURIA et al., 2018). In addition, the milk serum (also known as whey) contains lactoferrin, immunoglobulins, glycoproteins and enzymes (Abbring Fat in milk occurs as small globules containing mainly triacylglycerols, surrounded by a membrane of complex structure composed of various components, such as proteins, glycoproteins, enzymes and lipids (Zhao et al., 2019). Triglycerides form 98% of the fat fraction of milk and the remaining 2% are monoacylglycerol, diacylglycerols, phospholipids, free fatty acids and cholesterol. In addition, the lipid fraction of milk is the most complex of all-natural fats, considering that triacylglycerols are formed by approximately 400 fatty acid esters (FAEs) (Pereira, 2014). Furthermore, milk fat consists of 70% saturated fatty acids and 30% unsaturated fatty acids. Among the saturated, the most important from a quantitative point of view are palmitic (30%), myristic (11%) and stearic (12%). Then, the unsaturated fraction, oleic acid is present in concentrations between 24% and 35%, while polyunsaturated acids constitute about 1.6% linoleic and 0.7% α-linolenic and trans fatty acids such as vaccenic acid 2.7% and conjugated linoleic acid 0.34% -1.37% (Meena et al., 2019).
The profile of milk vitamins includes the fat soluble vitamins (A, D, E), associated with fat globules, and hydrosoluble vitamins (complex B and vitamin C). Milk can certainly be distinguished by its richness in B vitamins, contributing to daily intake of vitamins B6 and B2 indispensable to the normality of organic functions (Schmidt et al., 2017). In its mineral fraction, milk is recognized as a rich source of calcium, in addition to other elements such as phosphorus, magnesium, zinc and selenium (Pereira, 2014).
This work was conducted with the objective of employing the principal component analysis (PCA) to gather the most homogeneous form of milk producers in the city of Itapetinga-BA, Brazil, regarding the similarity of the physicochemical characteristics of milk Research, Society and Development, v. 9, n. 4, e41942808, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i4.2808 5 received by the industry, and to identify which characteristics most explain the differences between production systems.

Material and Methods
For this research, 20 milk samples were obtained from producers in the city of Itapetinga-BA, Brazil. They were acquired at the reception of a dairy located in the city, being From the standardized data set, the principal component analysis was performed, resulting in an ordering diagram according to the similarity of the variables considered. In order to choose the number of principal components, the method proposed by Cattel (1996) was adopted, which suggests that the amplitude of the self-values be represented graphically according to the number of self-values. The selection of the number of retained components was based on the breakpoint of the chart, when there is a sharp drop in the amplitude of the autovalues.
After determining the number of main components, the scores for each main component were estimated. The software OriginPro8 was used to improve the visualization of the graphs of the correlation matrix autovalues and the dispersion plot of the samples. As can be seen in Table 1, as well as can be graphically visualized in Figure 1, using the Catter method, the first two principal components (PC) were chosen, with an explanatory power of 87.24% of the total variance, demonstrating that the main component technique was effective to summarize the number of characteristics responsible for defining the groups. This situation presents a decrease in the characterization work as a consequence, with improvement in precision, besides making less complex the analysis and interpretation of the data. Source: Authors Table 2 shows the matrix of the weights, in which can be verified the variables that best correlate with each component. The first principal component (PC1 -66.45%) presented a positive correlation with the variables G, D, ESD, PTN and LAC, and lactose obtained a Research, Society and Development, v. 9, n. 4, e41942808, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i4.2808 7 lower contribution. The second principal component (PC2 -20.79%) presented a negative correlation only with variable G, presenting a positive correlation for the other variables, especially lactose with high correlation. PC1 presented the best distribution of the data, with the physicochemical variables with greater importance to determine the characteristics of each studied group. In regard of the Figure 2, it can be concluded that the variables D, ESD and PTN are the most representative, considering PC1, because they are located at the end of the x axis and, therefore, the farthest from the origin of the Cartesian axis, therefore, has the greatest influence. Among the variables G and LAC there is no correlation, due to the distance formed between them. Source: Authors Research, Society and Development, v. 9, n. 4, e41942808, 2020(CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i4.2808 From the acquisition of PC scores, the dispersion of the mean values of milk samples was analyzed according to the producers (Figure 3).

Source: Authors
The score graphic analysis led us to infer the occurrence of group formation from the set of samples. In first quadrant, it is observable the group with high correlation with PC2, indicating that this group is the producer of highest level of lactosis and lowest level of fat.
Besides, samples with positive correlation with PC1 and PC2 can be found in second quadrant, showing the same pattern in lactosis, density, no fat dry extract and protein levels.
In the third quadrant, the samples which exhibit positive correlation with PC1 are found, indicating low level of lactosis in these samples. Also, sample number 2 differed significantly from others by not forming any group. In fourth quadrant, the samples in which negative correlation with PC2 and PC1 are found, showing low content of fat and lower level of lactosis.

Conclusion
The Principal Component Analysis allowed the characterization and grouping of milk producers in the city of Itapetinga-BA in different groups. The distribution of samples in the scatterplot facilitated the visualization of samples with higher and lower concentrations of fat, protein, defatted dry extract, lactose and density, in addition to pointing out the tendency of similar data approximation, as well as indicating the influence of lactose variables. and fat in the formation of groups.
The procedure presented in this work has great potential when associating physicalchemical analysis by ultrasound with multivariate tools, as it is a fast and safe process, and can be used in the food industry as an alternative assessment of the physical-chemical characteristics of milk from different producers , aiming to improve its quality. However, qualified labor is required to generate the results and perform the data evaluation.
The expansion of the studies can be carried out with the proposal of increasing samples and expanding the analyzed parameters to improve the knowledge of the composition and quality of milk in the studied region.