Evaluation of probability distributions in the analysis of minimum temperature series in Manaus – AM

The relevance in studying climatological phenomena is based on the influence that variables of this nature exert on the world. Among the most observed variables, temperature stands out, whose effect of its variation may cause significant impacts, such as the proliferation of biological species, agricultural production, population health, etc. Probability distributions have been studied to verify the best fit to describe and/or predict the behavior of climate variables and, in this context, the present study evaluated, among six probability distributions, the best fit to describe a historical temperature series. minimum monthly mean. The series used in this study encompass a period of 38 years (1980 to 2018) separated by month from the weather station of the Manaus AM station (OMM: 82331) obtained from INMET, totaling 459 observations. Difference-Sign and Turning Point tests were used to verify data independence and the maximum likelihood method to estimate the parameters. Kolmogorov-Smirnov, Anderson-Darling, Cramér-von Mises, Akaike Information Criterion and quantile-quantile plots were used to select the best fit distribution. LogNormal, Gama, Weibull, Gumbel type II, Benini and Rice distributions were evaluated, with the best performing Rice, Log-Normal and Gumbel II distributions being highlighted.


Introduction
The relevance of studying climatological phenomena is based on the influence that variables of this nature have in different areas of knowledge or even in everyday life. Among the most observed variables, the temperature stands out, whose effect of its variation can cause significant impacts, such as in the proliferation of animal and vegetable species, agricultural production, population health, etc. From this perspective, analyzes of historical series of climatic variables have been carried out in order to describe and/or predict the behavior of these variables, as studies by (Astolpho (2003) Aguirre et al. (2020) and Santiago et al. (2020)) whose objective was to verify the best fit to describe climatological measures in cities in Brazil.
According to Fisch (1998), the region that presents the greatest vulnerability to climatic changes in Brazil is the Amazon and the Northeast, where they constitute what could be called climatic change hot spots, being associated with a high probability of higher average temperature increase (around five degrees centigrade, until the end of the century) than predicted for the rest of the Brazilian territory. According to Gomes (2015), the reasons for Manaus being more vulnerable to climate change in Brazil are due to global climate variations from natural causes, as well as changes in land use, for example, within the Amazon region itself. , that is, for anthropic cause.
According to Fisch (1998), the city of Manaus is located in the heart of the Amazon, classified as one of the most humid regions in the whole country. The city's climate is humid tropical, contained by high temperatures, high humidity and torrential rain. The author mentioned above also mentions that researchers have been elaborating models, through the processing of supercomputers of series of information of all kinds, linked to climatic situations, to try to predict future trends of climate change, in different scenarios. Alexander et al. (2006) carried out a research, considering more than 1,400 meteorological stations and verified the occurrence in the increase of the minimum temperatures in 70.0% of the analyzed continental regions, including South America. It highlights Guarienti et al. (2004) that one of the justifications of studying the behavior of minimum temperature is the fact that the production of wheat in the country is strongly linked to this climatic variable. Araújo et al. (2010) point out that verifying the probability distribution of variables associated with meteorological phenomena has the potential to assist in the execution of planning associated with agricultural activities, forecasting the climatic behavior of a given region of the country, among others. Catalunya et al. (2002) highlights that the temperature of a region can be estimated in probabilistic terms, through the use of probability distributions adjusted to historical data series.
Also according to Catalunha et al. (2002), the probability density functions are associated with the behavior of the data, in which these functions are characterized by having the ability to adjust for small or large databases, in addition to having specificities regarding the number of parameters, behavior such as: asymmetry, shape of bathtub and among others.
Still in the same context, the present study aims to evaluate, among six probability distributions, which offers the best fit to the historical series of minimum monthly average temperature in the city of Manaus, in Amazonas.

Material and Methods
The city of Manaus, in Amazonas, located in the Northern Region of Brazil, latitude 03 o 06'07'' and longitude 60 o 01'30'' with a tropical climate, has an area of 11.401.092 km 2 , with an estimated population of 2.145.444 inhabitants and density of 188.18 hab./km 2 (IBGE, 2018), <www.ibge.gov.br>. The climate of Manaus is considered to be a tropical humid monsoon (type Am according to the Koppen-Geiger climate classification), with an annual average compensated temperature of 27ºC and relatively high air humidity, with a rainfall index of around 2.300 millimeters per year. The seasons are relatively well defined when it comes to rain: winter is relatively dry, and summer is rainy. Due to the proximity of the Equator, the heat is constant from the local climate. There are no cold days in winter, and very intense polar air masses in the south-central part of the country and the southwest of the Amazon rarely have any effect on the city. According to data from the National Institute of Meteorology (INMET), <https://portal.inmet.gov.br/>, since 1961 the lowest temperature recorded in Manaus was 12.1ºC on July 9, 1989, and the highest reached 39ºC on September 21, 2015.
This quantitative research is characterized by the use of quantification, both in the collection as in the treatment of information, using statistical techniques (Richardson (1999); Pereira et al. (2018)). The monthly series of average minimum temperature used in this study covers a period of 38 years (1980 to 2018) separated by month, from the weather station of the Manaus -AM station (OMM: 82331), compiled from the historical series of average minimum temperature obtained in INMET. To verify data independence, Difference-Sign and Turning Point Test were applied. Parameter estimates were obtained using the maximum likelihood method. The Kolmogorov-Smirnov (KS), Anderson-Darling (AD) and Cramér-von Mises (CVM) tests were used as a criterion to verify the model that best fit the data, as well as the Akaike Information Criterion (AIC). Quantile-quantile plots were also used as a criterion of adequacy and distribution selection with the best fit.

Tests of independence
Many statistical procedures require a random sample (Brockwell & Davis, 2016), such as those performed in this work. Such a condition is not always valid and can be tested using a statistical hypothesis test. Therefore, we must test the hypothesis that is a sequence of independent and identically distributed random variables (i.i.d.) or not.

Turning Point Test
The main idea of this test is the sequence is random, three successive values, are equally likely to occur in any of the six possible orders with equal probability. In Only four of these would there be a turning point, namely When the greatest or the least of the three points is in the Middle, i.e.,

Log-Normal distribution
Let X be a normally distributed random variable, so has a Normal distribution. Likewise, if Y has a Normal distribution, then the exponential function of Y, , has a Log-Normal distribution with f.d.p , with mean and variance and (Johnson et al., 1995).

Gama distribution
The Gama distribution (Shea, 1988) with parameters shape = and scale = has density where and .
with mean and variance and , and cumulative distribution function given by .

Weibull distribution
The distribution Weibull (Weibull et al., 1951), with parameters of shape and scale , has density give by , with mean and variance of a Weibull distribution and , cumulative distribution function given by , for , and for .

Gumbel II distribution
The density distribution Gumbel-II (Gumbel, 1954) for a response is , for . The cumulative distribution function is .
The mean and variance of given by when e when .

Benini distribution
The Benini distribution (Benini, 1905) has a probability density function that can be written as for , and shape parameter .
The cumulative distribution function for is .

Rice distribution
The Rice distribution (Rice, 1945)

The goodness of fit test
The Akaike information criterion (AIC) provides a means for selecting models and, in this case, as a criterion for selecting the model with the best fit for the data studied here. The Kolmogorov-Smirnov, Anderson-Darling and Cramér-von Mises tests are often used as adherence tests, but they are also resources to measure the quality of the fit of a distribution to the analyzed data, considering that the higher the p-value (greater adherence), better fit the data to the evaluated model. Similarly, quantile-quantile plots are commonly used to compare a data set against a theoretical model. This can provide an assessment of the "good fit" that is graphical, rather than reducing to a numerical summary.

Akaike Information Criterion
The Akaike information criterion was developed by Akaike (1974)  Among the models evaluated, the model that points to the lowest AIC value is considered to be the best fit.

Kolmogorov-Smirnov test
The Kolmogorov -Smirnov test (Durbin, 1973) is a non-parametric test on the equality of continuous and onedimensional probability distributions that can be used to compare a sample with a reference probability distribution (onesample K -S test). The Kolmogorov -Smirnov statistic quantifies the distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. The empirical distribution function for observations independent and identically distributed is defined as , where is the indicator function, equal to 1 if and equal to 0, otherwise. As for the set of distances, we have to it is the supreme of the set of distances. By the Glivenko-Cantelli theorem, if the sample comes from the distribution , so , converge to 0 almost certainly on the edge when tends to infinity. .

Anderson-Darling test
The Anderson-Darling test (Anderson & Darling, 1952) assesses whether a sample comes from a specified distribution. It makes use of the fact that, when given a hypothetical underlying distribution and assuming that the data arise from that distribution, the cumulative distribution function of the data can be assumed to follow a uniform distribution. The data can be tested for uniformity with a distance test. The formula for test statistic A to assess whether data (note that the data must be put in order) comes from an accumulated distribution function is where .
The test statistic can then be compared with the critical values of the theoretical distribution. In this case, no parameters are estimated in relation to the cumulative F distribution function.

Cramér-von Mises test
The Cramér-von Mises criterion is a criterion used to judge the fit quality of an accumulated distribution function compared to a given empirical distribution function (Braun (1980); CSöRgő & Faraway (1996)). Let observed values, in ascending order. So the statistic is If this value is greater than the tabulated value, then the hypothesis that the data came from the F distribution in question can be rejected.

Quantile-Quantile plot
A probability plot or a quantile-quantile plot (Q-Q) is a graphical presentation designed by Wilk & Gnanadesikan (1968) to compare a set of data to a particular probability distribution or to compare it with another set of data. When comparing observations to a hypothetical distribution, take a random sample of some unknown distribution with cumulative distribution function and be the ordered observations. Depending on the particular formula used for the empirical distribution function, the i-th order statistic is an estimate of the , ,..., quantile. Suppose that the order statistic is an estimate of the quantile, i.e Fox (2016)).

Results and Discussion
The randomness tests applied to the data under a 95% confidence level showed that only the Difference-Sign test rejected the hypothesis of randomness of the data in the months of February and December (p-value < 0.05). The Turning Point Test did not reject randomness at any time (p-value > 0.05), also at a 95% significance level. The summary measures for the monthly average minimum temperature data are presented in Table 1. The series boxplots can be seen in Figure 1.  When analyzing the boxplots, there is a varied behavior between the series, with months showing symmetry, as in March, and others, asymmetry on the right, as evidenced in September. Already then, there are indications that the probability distribution selected to describe the month of March may not be the most suitable to describe the data of average minimum temperature for the month of September, for example, and this distinction is due to the distinct behavior between the series.
Except in the months of June, July and August, all months presented at least one outlier, that is, at least one month of the Research, Society andDevelopment, v. 10, n. 3, e46210313616, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i3.13616 observed years, except for those already mentioned, indicated a temperature very different from the others recorded. The descriptive measures of the data set can be seen in Table 2. Assuming that all distributions could adequately describe the monthly minimum temperature data, the parameters for each one were estimated. The graphs of estimated curves of the distributions on monthly histograms can be seen in Figure 2. Source: Authors.
From the graphical point of view, it can be seen that the Rice distribution has the worst performance to describe the data for the month of March, with extremely different estimates for the parameters and , when compared to the estimates to describe the other months ( ) and ( ), whose curve cannot be observed due to different limits for the x and y axes (x ranges from approximately 0 to 60 and y, from 0 to 0.04). The estimated curves generated by the Benini, Rice and Gumbell II distributions for the other months, as expected, approach the histogram only on occasions when the data have positive asymmetry. Except in January, the curves for the Gamma, Log-Normal and Rice distributions appear superimposed, pointing to similar adjustments.
Considering the adherence of the distributions to the data, assessed by the Kolmogorov-Smirnov, Anderson-Darling and Cramér-von Mises tests, the three tests agree that the Gamma, Log-Normal, Gumbell II, Weibull and Rice distributions fit the data, except in the cases of January and June. The hypothesis that the data come from a Rice distribution is rejected by the three tests at the level of 5% significance (p-value <0.001) only for the month of January.
As for the month of June, only the Kolmogorov-Smirnov test rejected the hypothesis that the data follow Weibull distribution (p-value = 0.098). The data for the months of June to December adhere to the Benini distribution according to the three tests, varying in the other months. For the quality of the fit, it was taken into account that the greater the adherence (greater p-value) the better the adjustment, according to the adherence tests, the results of which are found in Tables 3, 4 and 5.   The results of the Akaike criterion can be seen in Table 6. In inconclusive results, where there was a tie in relation to the tests, in the cases of May and June, the quantilequantile graph was decisive in favor of the Log-Normal distribution in both cases, against the Rice and Gumbel II distributions, respectively. In this case, it is preferable the distribution whose quantile-quantile graph has greater linearity and a greater number of points within the simulated confidence envelopes, which can be seen in Figure 3. Source: Authors. Table 7 contains the distributions selected as the best fit for the monthly minimum daily temperature data for the Manaus meteorological station according to the adopted criteria. Finally, the selected curves, according to the criteria presented for the evaluated distributions, according to the months to which they fit can be seen in Figure 4.

Conclusion
The Rice, Log-Normal and Gumbel type II distributions were the distributions selected as the best fit to describe the series of average minimum temperature of the Manaus station. It is emphasized here, as observed graphically and by the Kolmogorov-Smirnov, Anderson-Darling and Cramér-von Mises tests, that, in cases where the Log-Normal distribution emerges as the distribution with the most appropriate adjustment, the Gamma and Rice distributions could also be adopted with little difference between them (except in January), thus being recommended in the description of the behavior for mean minimum temperature data as potential competitors to those usually used. It is also important to highlight that, for studies of average minimum temperature data from other stations and/or another time interval, although the data sets are of the same nature, the behavior varies, also varying the distribution that can describe them, then it is up to the comparison of tests and distributions for a more adequate result. Therefore, future research can be carried out using other climatic variables, as well as in other states of Brazil, in order to investigate possible probabilistic models that describe such recurrences associated with climatic variables.