Artificial neural networks and remote sensing for volumetric prediction in a Eucalyptus sp. plantation

Forest inventory is an important tool for estimating the production of forest stands and normally employs traditional methods for volume estimation. However, as a result of technological advancements, artificial neural networks and remote sensing have assumed a prominent role in the forestry sector since satellite images have different components that correlate with the dendrometric variables and can be used as auxiliary variables. The objective of this work was to evaluate the performance of artificial neural networks regarding the estimation of volume in a Eucalyptus sp. plantation with the use of satellite images. Pre-cut inventory data were used with ages varying between 5.3 and 6.3 years. The variables used were volume, age, 4 bands of the satellite image with a 10 m spatial resolution from Sentinell-2 satellite, ratio between the bands, NDVI, and genetic material. All processing was performed using the free software R. The evaluation criteria for the neural network were percentage of residual standard error and graphical analysis of the residues. The best neural network configuration for volume estimation presented a residual standard error of 10.63% and 12.00% for training and validation, respectively. The methodology proposed in this work proved to be efficient in estimating the volume of the stand. utilizadas digital las genético. realizado software R neuronales temporal variations in the forest typologies. Both forest inventory and images were from 2016. Sentinel-2 satellite has a spatial resolution of 10 m, 20 m and 30 m and a 13-band spectral resolution. The proposed work used a 10 m spatial resolution and a 4-band spectral resolution (B02, 490 nm central wavelength, blue; B03, 560 nm central wavelength, green; B04, 665 nm central wavelength, red; and B08, 842 nm central wavelength, near infrared).


Introduction
In the last two decades, the majority of research conducted within the scope of satellite images together with Artificial Neural Network (ANN) has been focused on the estimation of biomass in forest stands, being estimated in units of mass (Frazier et al., 2014;López-Serrano et al., 2016;Lu et al., 2016;Sarker & Nichol, 2011;Wang et al., 2011). Although biomass can be used to obtain the wood volume with bark using allometric equations, this has not been the aim of most of the research developed. Coulibaly et al. (2008) mapped the biomass of a Canadian forest using ANN and Kriging interpolation, with geospatial data and various vegetation indexes extracted from the Ikonos satellite image. Wang & Xing (2008) and Zhu et al. (2015) applied ANN to model the biomass using spectral bands and vegetation indices from Landsat 5 and Worldview-2 satellites, in Chinese forests. Using SAR data, Del Frate & Solimini (2004) were able to calculate biomass in forests located in France, French Guiana and the Netherlands; similarly, Santi et al. (2015Santi et al. ( , 2017 estimated biomass in measures of volume and weight in the San Rossore e Molise park in Italy. In India, Nandy et al. (2017) in the Barkot forest and Deb et al. (2017) in the Bundelkhand region, measured forest biomass using ANN, integrating field inventory data, spectral bands, texture and vegetation indexes from the Resourcesat 1 and 2 satellite images. Almeida et al. (2009) carried out a study in an area located in the Amazon Rainforest, estimating the forest biomass processing ANN with spectral bands and vegetation indexes derived from Landsat 5 satellite images. Foody et al. (2001) using TM-Landsat 4 and5 images, andCutler et al. (2012) with SAR and TM-Landsat images, have used ANN to estimate biomass of tropical forests using spectral bands and vegetation indexes, in the first case, and bands and textures, in the second case, as input variables. Ferraz et al. (2014)

performed a study in a Tropical
Rainforest fragment, processing ANN with spectral bands and vegetation indexes using images from the Ikonos satellite.
In addition to all the aforementioned investigations, several studies have been carried out worldwide in different types of forest stands using ANN to estimate the volume of biomass and other morphological parameters of the trees, mainly using data obtained from field work and disregarding information from remote sensing (Bhering et al., 2015;Gorgens et al., 2009;Ingram et al., 2005;Jutras et al., 2009;Martins et al., 2016;Silva et al., 2009;Tavares Júnior et al., 2019;Vahedi, 2016).
From what has been observed, there are only a few studies on estimating stand volume or stand parameters using ANN and remote sensing data such as from Landsat, SPOT and SAR images (dos Reis et al., 2018;Miguel et al., 2015;Moreno et al., 2019;Sakici & Günlü, 2018;Santi et al., 2015;Zhou et al., 2020). Zhou et al. (2020) applied ANN to determine the volume of pine wood in a forest area in China, with images from the SPOT satellite. Miguel et al. (2015)  Additionally, no studies have been found using exact procedures for thematic mapping of the volume of wood or biomass in forest stands using data derived from Sentinel-2 satellite images through the structuring of ANN, so there is a need for the development of research with this approach to identify its potential application.
This study aims to evaluate the efficiency of an artificial neural network methodology, associated with Sentinel-2 satellite images, in estimating the volume (including bark) of a Eucalyptus sp. stand.

Methodology
The data used came from 569 rectangular permanent plots with an average area of 280.38 m 2 each of a Eucalyptus sp. plantation composed of nine different clonal varieties, with 3x2 m spacing between plants, and ages ranging from 5.3 to 6.3 years. In each plot, circumference at breast height (1.3 m above ground) of all trees, the total height of the first five trees, and dominant tree height were measured, according to Assman (1970).
The plantation was located in the interior of the state of São Paulo, in the city of Botucatu, Brazil ( Figure 1).
According to the Köeppen climate classification, the local climate is hot temperate (mesothermal). The multispectral data used to conduct the work consisted of Sentinel-2 satellite images. The images were chosen following the criterion of compatibility between date of image and date of execution of the inventory, in order to minimize the temporal variations in the forest typologies. Both forest inventory and images were from 2016. Sentinel-2 satellite has a spatial resolution of 10 m, 20 m and 30 m and a 13-band spectral resolution. The proposed work used a 10 m spatial resolution and a 4-band spectral resolution (B02, 490 nm central wavelength, blue; B03, 560 nm central wavelength, green; B04, 665 nm central wavelength, red; and B08, 842 nm central wavelength, near infrared).
The projection adopted was UTM 22 S -Universal Transverse Mercatorand the DATUM SIRGAS 2000 -Geocentric Reference System for the Americas.
Image information and vegetation indexes were obtained through the statistical software R (R Core Team, 2017), with the aid of the rgdal (Bivand et al., 2017) and raster (Hijmans, 2016) packages. In addition, all processing of the neural networks, tuning and network application were performed in software R.
Images were cut according to farm plots (area of interest), and only included information on the areas of effective planting in the image archive. Based on this, a raster was generated containing the information of each pixel, i.e., its coordinates (x, y) and the respective gray levels of each of the four spectral bands. The normalized difference vegetation index (NDVI) was calculated, as well as the simple ratio between the bands, i.e., band 2 divided by band 3, band 2 divided by band 4, and so on. By doing this, it is possible to discriminate subtle differences in the spectral behavior of different targets, whereas only gross differences are observed in original bands (Araujo & Mello, 2010).
Regarding the networks, training was carried out to obtain networks for estimating volume including bark. The volume, age, bands (Blue, Green, Red and near Infrared), ratio between bands and NDVI were used as input numerical variables. The genetic material was used as a categorical variable, represented by a sequence from 1 to 9.
Volumes were obtained using Smalian's formula and ranged from 144 to 456 m³/ha. Such variation might be due to damage (wind, burning, among others) in parts of the plantation. Information on the bands was extracted from raster using a 100 m buffer around the coordinates of each plot and the NDVI obtained by the mathematical equation NDVI = (ivp -vm) / (ivp + vm); where: ivp was the reflectance in the near infrared region and vm was the reflectance in the red region.
To obtain the estimate of the volume with bark, the data were randomly divided into two parts: 70% towards training of the networks and 30% towards generalization, i.e., applying the trained networks to the data not used in the training for validation. After selecting the data for the training of the neural networks, data normalization was performed. This step consists in the transformation of each numerical variable into values between 0 and 1. Normalization is a technique used to transform variables and homogenize them, thus preventing very high-value variables from interfering with the estimates (Gorgens et al., 2009).
Network learning was supervised, i.e., the networks received two sets of values: an input set and an output set (Haykin, 2001b). Thus, training consisted in an optimization of the network parameters, so that they could respond to the inputs as expected until the standard errors of the output, generated by the network, reached the desired minimum value .
The present study used a Multi-Layer Perceptron architecture with 21 neurons in the input layer, 2 intermediate hidden layers and 1 neuron in the output layer ( Figure 2). This was the only ANN architecture used because, according to Chiarello et al. (2019), in forest biometrics and modeling state of the art, regarding the use of artificial neural networks, 78% of works used Multi-layer Perceptron architecture when the second most used was Radial Basis Function network with only 12% of adoption. Research, Society and Development, v. 10, n. 12, e250101220466, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i12.20466 The algorithm adopted was Backpropagation, which iteratively seeks to find the minimum difference between the desired outputs and the outputs obtained by the neural network with minimal error. The weights between the layers were adjusted through backpropagation of the error found in each iteration (Haykin, 2001a).
In the study were evaluated 25 different configurations of neural networks referring to the number of variables in the hidden layers. A reduction factor was applied so that the number of variables of the first hidden layer was reduced by half in relation to the number of input variables and, similarly, a reduction factor was applied to the second hidden layer in order to limit a maximum number of variables based on the number of variables of the first hidden layer.
The evaluations of the estimates by the artificial neural network in the training and validation stages were conducted according to residual standard error in cubic meters per hectare and in percentage (Sxy). The closer to zero, the higher the accuracy of the estimates and graphical analysis of the errors (m³). Error analysis was carried out based on a residual distribution and quantile-quantile graph, which was used to verify if the frequency distribution of the data fit a normal distribution.

Results and Discussion
Regarding the bands (Table 1), the means obtained show the difference between the average values of gray level that correspond to the brightness of the image (Ribeiro et al., 2009). By observing each band's standard deviation, it is possible to verify that every band presents a contrast of the image. In the present work, B08 was the band with the greatest standard deviation, which resulted in the greatest contrast of the image among the bands, i.e., it showed the clearest image, with the greatest scattering of gray levels. Conversely, when a band has a low standard deviation and low contrast, it will have darker images.
The normalized difference vegetation index normalizes the simple ratio to the range of -1 to 1. Areas with intense vegetation approach the upper levels and wetlands approach the lower limit (Cordeiro et al., 2017). Table 1 shows that the average NDVI of the data is close to 1 indicating that the area generally presents intense vegetation, not showing planting faults or areas without vegetation.   Source: Authors (2021).
The first layer (1) is the input and presents the variables used, such as NDVI and age in years, and where: b2, b3, b4, and b8 are the bands; b2_b3, b2_b4, b2_b8, b3_b4, b3_b8, and b4_b8 are the ratio between the bands; matgen1 to matgen9 are the genetic materials. The second (2) and third (3) layers are the hidden layers, and the last layer (4) is the output layer, i.e., the variable of interest (volume). The layers show the weights related to each neuron, which were updated at each iteration of the network so that the final result was the smallest possible error between the desired value and the observed value.  Research, Society andDevelopment, v. 10, n. 12, e250101220466, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i12.20466 7 growth and production modelling at total stand level of Eucalyptus sp. clones and obtained 2-7-1 as best architecture with standard error of 8.48% and 12.90% for training and validation, respectively. On the other hand, Miguel et al. (2015) obtained a standard error of estimate (Syx) of 4.93% for training and 6.01% for validation, when evaluating the performance of the neural networks for modeling the volume of wood with data from ResourceSat1 satellite. Sakici and Günlü (2018) found that when estimating some stand attributes (i.e. mean diameter, basal area, stand volume and number of trees) of Crimean pine stands using texture values obtained from satellite images, in that case from Landsat 8 OLI, some ANN models performed better than multiple linear regression models. The R² values obtained for the best ANN models increased between 48% and 239% for the stand parameters compared to the regression models, being the ANN models more accurate for mixed, broadleaf and conifer forest types than linear regression model. Zhou et al. (2020), estimated the stock volume of pine plantations in China, processing spectral bands and image texture from SPOT satellite with ANN, obtaining Syx values of 31.45% (45.44 m 3 /ha). Also in Italy, Santi et al. (2015), with L and C bands of SAR images, applied ANN to determine the volume of wood in forests with pine and other oak forests, reaching Syx (m³/ha) results of 40 m 3 /ha and 30 m 3 /ha, correspondingly.
Thus, by comparing the results obtained in the present work with the results from other authors, it can be said that the database studied here obtained satisfactory results. According to Oliveira (2012), better results can be achieved by analyzing the existing correlations between the information extracted from the images and the dendrometric data, as well as evaluating new combinations in relation to the activation functions, and selecting the input variables that influence the output variable (volume) in the neural networks by applying the stepwise method.
The fact that the error of the validation is close to that of the training indicates the non-occurrence of overfitting, i.e., the variables used for the training were sufficient for the neural network to be generalized and applied to other data. In addition, it can be noted that different configurations of the neural networks in terms of algorithm, number of hidden layers, activation function, input variables, among others, influence the final results obtained by the neural networks. Figure 4 shows a map with the values of the estimated volumes found by neural networks in a Eucalyptus sp. plantation. Volumes range from 228.40 to 366.50 m³/ha.
Although the statistics presented are good indicators of the results obtained, it is fundamental to conduct a graphical analysis of the residues, since tendentious errors may occur and not be detected by the statistics. This would lead to underestimation, in case the estimated value was lower than that observed, or overestimation, otherwise. Figure 5 shows the dispersion of the residues with respect to the observed values and the quantile-quantile plot.  The results presented and analysis conducted show that the results were efficient for the proposed problem. Likewise, Silva et al. (2009) evaluated the performance of the neural networks in estimating the volume of eucalyptus wood and concluded that the networks were suitable to the tested situations and, therefore, recommend their use for volume estimates.
However, dos Reis et al. (2018), when studying spatial prediction of basal area and volume in Eucalyptus stands using Landsat TM data, found that ANN, in general, are very sensitive to the variation of input parameters, more than other methods, especially when using a restricted dataset, resulting in estimates that were not compatible with the forest inventory estimates.
Dos Reis et al. (2018) also pointed out that their results need to be interpreted cautiously, as they are limited to a homogenous and relatively small study area, but it still showcases the importance of using remote sensing data and prediction methods for volume estimation. These statements reinforce the idea that further studies are needed with the use of artificial neural networks to estimate forest parameters in order to find reliable methodologies with more consistent results.

Conclusion
The ANN technique together with the use of images from Sentinel-2 satellite made it possible to estimate the volume of Eucalyptus sp. plantations with statistically acceptable error values, 10.63% for training and 12.00% for validation.
The network that presented the best estimates in this work has an architecture with 8 and 3 neurons in the first layer and second hidden layers, respectively, and is composed of 21 input variables.
This methodology can be applied to other inventories with no additional costs since all processing was carried out using the free software R and satellite images obtained free of charge.
New studies are encouraged, especially those using free image sources such as LANDSAT, MODIS, ASTER, and Sentinel, for example. Regarding Forest Inventories, companies are always trying to reduce costs, and free satellite images are great alternatives when used together with artificial neural networks to ensure precision with no additional cost.