Determination of factors in pepper genotypes for ornamentation

The present work had as main objective to use factor analysis to describe the structure of variability of characteristics considered commercially important in ornamental pepper, aiming to summarize the information contained in such variables in a smaller number of latent variables or factors. For that, 12 quantitative traits were evaluated in 29 pepper genotypes Research, Society and Development, v. 9, n. 11, e58191110348, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i11.10348 3 (Capsicum annuum). Of the factors created, 3 presented practical interpretation, grouping a total of 12 variables into factors related to “fruit quality” (8 variables), “plant size” (2 variables) and “plant architecture” (2 variables).


Introduction
Pepper is an economically important crop worldwide. In addition to their importance in food, pharmacology, dentistry and medicine, peppers have great potential for ornamentation. Among pepper species, Capsicum annuum is the most used in planting with ornamental purposes, due to its small size and the great variability of fruit shapes and colors (Finger et al., 2012). Despite the great variability that exists, few commercial varieties are Research, Society andDevelopment, v. 9, n. 11, e58191110348, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i11.10348 4 used for ornamentation in the country (Vasconcelos et al., 2012). The ornamentals market lacks novelties and new products add competitiveness to the sector and considerably increase the profit margin (Costa et al., 2019). The study of morphological and agronomic characteristics of cultivated plants is important to understand the genetic divergence of the set of germplasm available for use in a breeding program (Elias et al., 2007).
The study of the morphological characters can be carried out individually or simultaneously. Simultaneous analysis allows to conclude for more than one variable and thus better interpret the relationship between them. Multivariate statistics allows the study of complex phenomena, as it performs the treatment of several variables simultaneously (Johnson & Wichern, 2007). Among the multivariate techniques, factor analysis allows to reduce the dimension of the analyzed variables. It makes it possible to group correlated variables into unobservable factors (latent variables), defined through the correlation between variables. After the identification and interpretation of the factors, the latent variables can be predicted and their values used in later analyzes (Silva et al., 2014).
Given the above, this study aimed to verify the empirical structure of morphoagronomic characteristics of C. annuum genotypes, so that correlated variables are grouped into a smaller number of latent (interpretable) variables, reducing the dimensionality of the data set.

Plant materials
The experiment was carried out between the months of November 2017 and March 2018, in a greenhouse, at the Department of Fitotechnics of the Federal University of Viçosa (DFT / UFV). The city of Viçosa, belonging to the state of Minas Gerais, is located at 650 m altitude, latitude 20°45"47" South and longitude 42°49"13" west. According to the Köppen classification, the city's climate is characterized by the cold-dry season between April and August and the hot-rainy season between September and March, with an annual average of 1,341 mm of rainfall and 21,6°C e 14,0°C maximum and minimum temperatures, respectively.
29 genotypes of the species Capsicum annuum were evaluated, selected because they have potential for ornamentation: reduced size; fruits with different shapes; different colors, at different stages of maturation; different positions and quantity of fruits produced; ease of Research, Society and Development, v. 9, n. 11, e58191110348, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i11.10348 5 cultivation; fruit and leaf durability and the ability to grow in containers as a perennial plant (Neitzke et al., 2010). The experiment was installed in a greenhouse in a completely randomized design (DIC), with 29 treatments (genotypes) and five replications, with the experimental unit consisting of one plant per pot.
Sowing was carried out in November due to the requirement of the species for high temperatures. The pepper plant requires high temperatures throughout the cycle, the ideal average monthly temperatures are between 21 ° C and 30 ° C (Rêgo et al., 2011). With the aid of a datalogger, temperature data ranging from 24 °C to 32 °C were collected during the experiment.
The seeds were sown in polystyrene trays, with 200 cells, containing commercial substrate. Two seeds were distributed per cell to ensure that there was no idle cell. The trays were covered with a black cloth to speed up the germination process and kept suspended to favor the natural pruning of the root system by air (Rêgo et al., 2011).
In January, seedlings with four pairs of definitive leaves were transplanted into 800 ml pots and thinning was performed after one week. Manual irrigations were performed three times a day, maintaining the humidity level, being sufficient to start draining at the bottom of the tray (Rêgo et al., 2011). Fertilization and pest and weed control were carried out whenever necessary. The 29 genotypes evaluated are shown in Table 1.  Table 1 identifies the 29 Capsicum annuum genotypes evaluated in the experiment, identified by number and common name. The first 15 genotypes are from New Mexico, the next 3 from the UFV germplasm bank and the remaining 11 are commercial varieties.

Morpho-agronomic characterization
For the morpho-agronomic characterization of the genotypes, the ornamental potential associated with characters of interest for consumption was taken into account, considering that pepper plants can have dual purposes. The descriptors established by the International Plant Genetic Resources Institute for the genus Capsicum were taken as a base (IPGRI, 1995).
In March, the fruits were collected and the necessary measurements and evaluations were carried out. For the development of the research, the quantitative method was used, in which the collection of quantitative data was carried out by means of point measurements of quantities using metrology. The collected numbers, with their respective units, generated data sets that were analyzed using mathematical techniques (Pereira et al., 2018). Research, Society and Development, v. 9, n. 11, e58191110348, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i11.10348 7

Statistical analyses
The factorial model adopted for an observable variable, with mean can be represented by the (Johnson & Wichern, 2007): Where: represents the observable variables with mean , i=1,2,...,p e m , and m ≤p, in which p is the number of observable variables; The elements refer to the factor loads associated with the i th variable and the j th common factor; Fj,, j = 1,2, ... m. Fj corresponds to common unobservable latent factors; are the random errors associated with the i th variable .
To measure the adequacy of the analysis, the Kaiser-Meyer-Olkin (KMO) criterion and Bartlett's sphericity test were used (Ferreira, 2011). The number of factors was defined using two criteria. The first criterion was the analysis of the proportion of the total explained variance. An explanation percentage of 70% of the total variability was considered, which according to Ferreira (2011) is sufficient to satisfactorily reduce the data.
The second criterion used was the eigenvalue criterion or Kaiser criterion in which the number of factors will be equal to the number of eigenvalues greater than or equal to 1 (Kaiser, 1958). In addition to the criteria used, the choice of the number of factors (m) took into account the interpretability of the factors and the principle of parsimony (Mingoti, 2005).
The relationship between variables and common factors was made through loadings ( ), or factorial loads, which represent the correlation between each variable and the respective factors. The loadings values, as well as the simple correlation, vary between -1 and 1 and, the higher the factor load (in module) the greater the correlation between the variable and the respective factor. Making it possible to name the factors based on those variables that are most related to them (Teixeira et al., 2015).
For a better interpretation of the distribution of variables in the respective factors, varimax rotation was used. To evaluate the proportion of each variable explained by the factor to which it belongs and the proportion explained by the random error, the values of commonality were calculated. After identifying and interpreting the factors, the values of the scores for each factor were calculated. Through the scores it is possible to predict the values for each sample unit and use these latent variables (factors) in later analyzes (Teixeira et al., 2015).

Results and Discussion
In order to assess the association structure between the variables, the correlation matrix was calculated. The existence of a correlation between the variables in the data set is of great importance for factor analysis, since this technique aims to identify relationships between variables. This relationship only exists in the presence of a correlation between them (Corrar et al., 2009).
To improve the visualization and interpretation of the correlations between the variables, a correlogram was created represented in Figure 1.  Figure 1 shows the correlogram graph represents the bivariate correlations. It is observed that the positive correlations are blue, with stronger tones for the higher correlations and negative correlations are in pink, with stronger tones for higher correlations. (Silva et al., 2014). It is observed that there are strong and positive correlations between the variables FW, FL, FD, TP, NS/FR, FM, DM e CORD, and between variables PH e SL. In addition to a very strong and negative correlation between the variables CAD e SD. As most of the variables showed high correlations with each other, it can be assumed that the use of factor analysis will be promising.
According to the KMO index (0.74), considered satifastory by criteria of Pallant (2007), which suggests 0.60 as a reasonable value, and with Bartlett's sphericity test, which showed statistical significance (ρ < 0.01), it was found that the data are adequate for factor analysis. To determine the number of factors to be used in the factor analysis, it is necessary to calculate the eigenvalues of the correlation matrix, as shown in the Table 2.  Table 2 shows the eigenvalues and the accumulated variance of the 12 main components obtained from the genetic correlation matrix, corresponding to the 12 characteristics evaluated. Eigenvalues are numbers that reflect the importance of the factor, and divided by the sum of all eigenvalues indicate the proportion of the total variability of the data that is explained by the factor (Silva et al., 2014).
Only the first three components are associated with eigenvalues that are greater than one. Thus, according to Kaiser's criteria (Kaiser, 1958), the data can be condensed into three factors. The accumulated variance of the first three components was greater than 80%, indicating that these three factors are sufficient because they represent 84.46% of all variability. Mingoti (2005) suggests that the number of factors to be retained should reflect a value greater than 70% of the original data variability.
The Varimax rotation method was used to give factors greater potential for interpretability, making the factorial solution simpler and more meaningful (Johnson & Wichern, 2007). After varimax rotation, it was observed the formation of 3 factors. The variables were grouped into each factor according to the factor loads, as can be seen in Table   3, where the values of communalities are also presented.    Table 3 shows the 3 factors obtained by the factor analysis with their respective factor loads for each analyzed variable. The greater the factor load, the greater the correlation of the variable with the factor. Variables with greater loads (greater correlation) within a factor are allocated within it. The values of commonality are also presented for each variable evaluated.
The first factor (F1) was made up by variables related to fruit characteristics (FW, FL, FD, TP, FM, DM, CORD). This result indicates that the variables related to fruits are highly correlated with each other, which makes it possible to denote this factor as "fruit quality". It is observed that all variables have positive correlation values, that is, the higher the value of these variables, the higher the value of the scores of the new variable formed.
The second factor (F2) was composed of variables related to the plant length (PH, SL), denominate "plant size". As in the previous factor for all variables, the loadings were positive, thus, the value of the score for this factor will increase according to the increase in the variables belonging to it. The third factor (F3) can be referred to as "plant architecture" since it grouped two characteristics related to diameter (SD, CAD) that are part of the determination of the architecture and harmony of the plant.
For all variables belonging to the factors that have a practical interpretation, mentioned above, the values of commonality (> 0.60) were acceptable. According to Figueiredo Filho (2010) the values of commonality must be greater than 0.5. The commonality is proportion of variance (or correlation), of each variable explained, by common factors. They can also be interpreted as indexes attributed to the original variables that express, in percentage terms, how much of the variability of each variable is explained by the model adopted. Thus, the higher the commonality values, the better the adjustment of the factorial model (Silva et al., 2014).
The high values of commonality, obtained in this study, demonstrate that the factorial model fits very well to the evaluated data set. The factors obtained represent a large part of the variability of the original data. The reduction in the number of characteristics evaluated in just 3 factors, facilitates the study and understanding of the correlation structure between variables. In addition, reducing the data set facilitates further analysis.

Conclusion
With the analysis of factors, the 12 characteristics evaluated were reduced to only 3 factors with practical interpretation, with a satisfactory percentage of explained variability.
Subsequently, the factors may have their values predicted for each sample unit, through the scores. These scores can be used for analysis of variance using a smaller set of data. In addition to the use of these colors in obtaining selection indexes free of multicollinearity, since in the factor analysis the highly correlated characteristics are grouped within one factor and among the factors the correlation is low or nonexistent.