Artificial neural network model for predicting load capacity of driven piles

In geotechnics, several models, empirical or not, have been proposed for the calculation of load capacity in deep foundations. These models run mainly through physical assumptions and construction of approximations by mathematical models. Artificial Neural Networks (ANN), in addition to other applications, are excellent computational mechanisms that, based on biological neural learning, can perform predictions and approximations of functions. In this work, an application of artificial neural networks is presented. The objective here is to propose a mathematical model based on artificial intelligence focused on Artificial Neural Network (ANN) learning capable of predicting the load capacity for driven piles. The results obtained through the neural network were compared with actual values of load capacities obtained in the field through load tests. For this quantitative comparison, the following metrics have been chosen: Pearson correlation coefficient and mean squared error. The database used to carry out the project consisted of 233 load tests, carried out in diverse cities and different countries, for which load capacity, hammer weight, hammer drop height, pile length, pile diameter and pile penetration per blow values were available. These values have been used as input values in a multilayer perceptron neural network to estimate the load capacities of the respective piles. It has been found that the proposed neural model presented, in general, correlation with field values above 90%, reaching 96% in the best result.


Introduction
In engineering, for example, especially in geotechnics, several models, empirical or not, have been proposed for the calculation of load capacity in deep foundations. These models run mainly through physical assumptions and construction of approximations by mathematical models.
A major problem in geotechnics is the calculation of the load capacity of deep foundations (Fellenius, 2020). It is common to find in the literature several ways to perform this task, but accuracy of such solutions is, in general, imprecise, mainly due to factors such as the fact that some formulas used are obtained empirically or roughly.
Foundations are the interfacing elements responsible for carrying any buildings resting in the earth (Bowles, 1996). Cintra & Aoki (2011) define foundation as a system formed by structural elements of foundation (SEF) and the various layers of soils that surround them. It can be said, so, that foundation is the part of the construction that is responsible for receiving the loads of the structure and transmitting it to the underlying soil or rock on which it (Das, 2010;Bowles, 1996;Azeredo, 1977).
Considering its function, Bowles (1996) explains that the soil must be capable of supporting those loads without failure nor an excessive and intolerable settlement. To meet such requirements the design of foundations generally requires a knowledge of both the behavior and stress-related deformability and of geological conditions of soils that will support the foundation (Das, 2011) Foundations are usually classified into two major categories: shallow foundations and deep foundations. The criteria for classification, although based in similar ideas, may vary according to the different authors. Das (2011, p.1) affirms that "in most shallow foundations the depth of embedment can be equal to or less than three to four times the width of the foundation".
According to Bowles (1996) foundations may be classified based on where the ground the element sits: for shallow foundations, the depth is generally lower than the base dimension, while deep foundations have base length over four times its base dimension. Another classification considers that a deep foundation is one whose base rupture mechanism does not reach the surface of the ground (Hachich et al, 1998;Velloso & Lopes, 2011). In turn, NBR 6122/2010 defines that deep foundation is the foundation element that transmits the load to the ground either by tip resistance, shaft resistance, or by a combination of that two, and its tip or base is at a depth greater than twice its smallest base dimension, being at least 3.0 m (ABNT, 2010).
Focusing on the second main group, Vesic (1963) explains that deep foundations can be divided on two types: the first one refers to foundations installed by some process of excavation or drilling, not inducing significant changes in the adjacent Development, v. 10, n. 1, e12210111526, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i1.11526 soil; the second one represented by foundations forced into the ground by such operations as driving, which promotes significant changes in bearing soil.
Investigation of ultimate bearing capacity of deep is fundamental (Vesic, 1963). To determine the ultimate capacity of an isolated pile, three verification mechanisms can be used: static formulas (theoretical or empirical), dynamic equations, or load tests.
Despite numerous theoretical and experimental investigation already conducted to predict the behavior and the loadbearing capacity of piles its mechanisms are not yet completely understood (Das, 2011). The same author states, then, that "the design and analysis of pile foundations may thus be considered somewhat of an art as a result of the uncertainties involved in working with some subsoil conditions" (Das, 2011, p.536).
The ultimate load capacity is obtained by the sum of the pile point capacity and the frictional resistance (skin friction) derived from the soil-pile interface (Das, 2011;Fellenius, 2020). Bowles (1996) states that although this idea is not extraordinarily complex, obtaining a prediction of capacity close to actual load tests values through its use is not a frequent event once a lack of correspondence may frequently occur due to the difficulties in determining the in-situ soil properties and its changes in the pile´s vicinity after its installation. The soil natural variability coupled with the complex pile-soil interaction, creates a difficult problem for accurate prediction (Bowles, 1996).
The mathematical construction of Artificial Neural Networks (ANNs) is based on the electrical, chemical and biological relationships that occur in the human nervous system. In this system, the importance of neurons stands out. Neurons are excitable (or self-excitable) cells that communicate with each other by synapses, forming functional networks for processing and storing information (Haykin, 2001). The main characteristic of an ANN is the ability to "learn" tasks for which they are assigned. In addition, they can extrapolate this learning to new situations (generalization capacity). Mathematically, it can be said that the learning of an ANN consists of adjusting the set of weights to perform a specific task (Batista, 2012).
In an artificial neural network neurons are organized in the form of layers, and the way these layers are arranged defines the architecture of the network (Tian & Shang, 2006). According to Haykin (2001) the perceptron is the simplest form of a neural network initially used for classifying linearly separable patterns. This network consists of a single neuron with adjustable synaptic weights (Haykin, 2001). The Multilayer Perceptron (MLP) consists of a neural structure composed of a layer of input neurons, one or more hidden layers, and an output layer (Batista, 2012).
The most common training algorithm of a MLP is Backpropagation. This algorithm consists of using network output errors to retroactively update synaptic weights, i.e., "from output to network input" (Yan et al, 2006). The retro propagation algorithm is implemented considering two phases, forward and backward. In the forward phase the input values are multiplied by the synaptic weights in the input/output direction and at the end of that phase an error is calculated. In the backward phase, optimization techniques are applied on the error in such a way that the synaptic weights are adjusted in the output/input order (Silva, Spatti & Flauzino, 2016).
In this context, it is possible to minimize the errors and uncertainties arising from current models using neuronal models to predict load capacity of foundations, a possibility that is the main motivator of this work. Amancio (2013) proposed a model based on perceptron-type network for predicting settlement in deep foundations.
The author obtained a correlation of 0.89 as best result. Araújo, Neto & Anjos (2015) also proposed the development of a model for predicting pile settlement using ANN. The model presented a correlation coefficient between the actual and the estimated settlements of 0.96 in the validation phase. Erzin & Gul (2014) ;Padmini, Ilamparuthi & Sudheer (2008) also proposed ANN-based methodologies for predicting load capacity in deep foundations. However, what is observed in the studies is that, although all of them present a high correlation with real values, these studies do not present, for example, the values referring to internal variables of the model, which makes it difficult to be used it in practice.
The objective here is to propose a mathematical model based on artificial intelligence focused on Artificial Neural Network (ANN) learning capable of predicting the load capacity for driven piles. Unlike the studies already existing in the literature, this article presents all the internal parameters and matrices that make up the model, so that any reader can use it.

Methodology
As previously stated, to evaluate the quality of ultimate capacity estimates obtained the work compares the load capacity predictions obtained through ANNs models with load capacities obtained through load tests. In this article a network known as Multilayer Perceptron is used.
Considering the theoretical discussions presented by Zanella (2013), this work can be classified, in methodological terms, as to the objectives, as being explanatory and as to the procedures adopted in data collection as ex-post facto. Regarding the classification proposed by Pereira (2018), this research makes use of the quantitative and statistical method. Silva et al (2010) explain that the ANN-based prediction methodology is divided into two phases: training and testing.
In the training phase, part of the data is presented to the network to find the synaptic weights that minimize the errors between the network outputs and the desired values. In the test phase, the synaptic weights found in the training are used as parameters of the network and the errors are calculated.
The proposed model has been compared with real results obtained in load tests and evaluated through mean squared error and correlation between model´s outputs and field values. The database used to carry out the project then consisted of 233 load tests (Table 1), carried out in diverse cities and different countries, for which load capacity, hammer weight, hammer drop height, pile length and pile diameter values were available. The database consists of both load tests reported in the literature and monitored in the authors' professional practice. Among the 233 tests available, 153 also had information on the modulus of elasticity of the pile. For the 80 piles that did not have the exact value of such parameter, a conservative estimated value of 25GPa was adopted. Then, once this parameter is not always previously known, the proposed models have been developed under two conditions: one including the modulus of elasticity and another one that did not include such parameter. well as the topology of the networks and their parameters will be presented in graphs and tables.
To evaluate the quality of the predictions made by the proposed model, the results obtained have been compared with the actual results obtained through load tests. For this quantitative comparison, the following metrics have been chosen: Pearson correlation coefficient and mean squared error.
Correlation identifies two groups of data with some relationship to each other, that is, if high (low) values of one of the variables implicated in high (or low) values of another variable. A correlation analysis provides a number that summarizes the degree of linear relationship between the two variables, which is called the correlation coefficient. The choice of the correlation coefficient (Equation 1) was because it is a metric widely used to evaluate comparisons such as the one this research wants to make (Benesty et al,2009).
where X and Y are the compared variables.
However, the linear correlation coefficient may lead to false conclusions when used in as an accuracy index of predictions or simulations. For example, simulated and observed values can be highly correlated, even in situations where simulations are overestimating or underestimating what is observed.
Thus, a measure often used to evaluate the accuracy of numerical models is the Mean Squared Error (MSE). The MSE is defined as the mean of the difference between the estimator value and the squared parameter.
The mean squared error is obtained by the expression presented in Equation 2 (Willmott & Matsuura, 2005).
where is the measured value, is the value obtained in the analyzed model and is the number of samples.
In other words, the choice of correlation was because this is widely used to evaluate comparisons of this type, while the use of the mean squared error is important because its a better way to verify the accuracy of the model.
All simulations have been made by fixing the network topology in only one hidden layer and with sigmoid activation function. In some cases, the learning rate and the number of neurons of the hidden layer (nno) have been varied. The learning rate μ is a parameter that defines, mainly, the distance between the network output error and the minimum error that the problem admits. The number of neurons in each layer defines the nonlinearity of the data presented to the network. It is worth remembering that the values chosen for such parameters are chosen empirically, which justifies the performance of several tests to verify which parameters best fit the problem under analysis.
In the simulations performed the number of neurons in the input layer has always been equal to the number of inputs and in the output only one neuron has been considered. From the total set of data, 202 values have been randomly chosen for network training and 30 for testing.

Results and Discussion
First models were tested using hammer weight, hammer drop height, pile length, pile diameter and permanent penetration per blow values as parameters, without including modulus of elasticity. The comparison of ANN results and load tests are presented in figures. Optimal weights obtained in each simulation are also presented.
For a better understanding of the importance of weight matrices it is necessary to be in mind that the columns of w1, for example, refer to coefficient vectors of each input variable ordered as follows: hammer weight (W), hammer drop height (H), pile length (L), pile diameter (D), permanent penetration of the pile caused by the application of the last hammer blow (S) and bias (characteristic value of RNA models).
The comparison of ANN results (when the learning rate μ=0.1000 and nno=8 was used) with the actual measured values is shown in Figure 1. On this first attempt the MSE obtained was 2.0689x10^6 and the correlation achieved 0.9415.

Figure 1. Comparison between actual values and ANN outputs (not including modulus of elasticity, and
).
Analyzing Figure 1 it can be noticed that the network obtained an excellent approximation of the values, except for sample 15, which showed a significant difference between the actual and estimated values. Development, v. 10, n. 1, e12210111526, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i1.11526 12 The optimal weights obtained in this simulation were: and , where w1 is always the matrix of hidden layer weights and w2 always the output layer weight matrix.
The weights obtained in the first simulation indicate that the variables whose coefficient values contribute most in absolute terms are: hammer weight (W) and hammer drop height (H) (first and second column of w1 respectively).  In the second simulation, the greatest discrepancies between the actual values and those observed at the network output were observed for samples 22 and 24. For the other samples the network made a good prediction.
The optimal weights obtained in this simulation were: and , with w1 and w2 having the same meanings described in the previous case.
For the second simulation, the columns of the w1 weight matrix suggest that the values referring to the hammer weight coefficients, on average, had a greater contribution to the model obtained.
The result of comparisons when the learning rate μ=0.9000 and nno=8 were used is shown in Figure 3. In this case the MSE was much worse than the values obtained in the previous two attempts, reaching 5.4141x10^6 while the correlation remained close to that of the second test, reaching 0.7758. Observing Figure 3, it may be highlighted that although the neural model performed good approximations, in one of the samples (number 21), the difference between the value of the measured load capacity and the network output is visibly unacceptable.
The optimal weights obtained in this simulation were: and , Observing the weight matrices of the third simulation, a certain homogeneity is perceived in the relationship between the vectors that compose the coefficients, but as in first simulation the variables hammer weight (W) and hammer fall height (H) present a superiority in the values of their coefficients.
In the following simulations the Modulus of Elasticity of the pile (in KN/m²) was introduced to the ANN input set.
The results obtained when using the learning rate μ=0.1000 and nno=2 are shown in Figure 4. In this simulation the MSE was 2.8504x10^6 and the correlation 0.9005. The pattern verified in figures 1 to 3 is repeated when we observe Figure 4, that is, the neural network performs, in general, good approximations, but for some isolated samples the performance is unsatisfactory. In this simulation, in particular, the network fails to approach samples 2 and 12.
The optimal weights obtained in this simulation were: and , The w1 weight matrix of the fourth simulation indicates that the second column has, on average, a higher value than the other ones, indicating once again the variable hammer drop height is, in fact, an important information in the prediction of load capacity.  In Figure 5, the results presented demonstrate the best network performance among the simulations performed. In this figure, it is perceived that the approximation is well performed by the model for all samples, and this fact is ratified with the correlation and error values mentioned above.
The optimal weights obtained in this simulation were: and , being, again, w1 the hidden layer weight matrix and w2 the output layer weight matrix, as previously described. Finally, the last simulation was performed using learning rate μ=0.5000 and nno=2 ( Figure 6). The analysis metrics indicated in this case the values of MSE=1.3211x10^6 and correlation=0.9433. In the last simulation performed, it was again realized that for some samples the network fails in the forecast (in this case for samples 21 and 22).
The optimal weights obtained in this simulation were: and , In the sixth and last simulation a fact draws attention. This fact refers to the coefficient value of the variable "permanent penetration of a pile caused by the application of the last hammer blow" which presents, in module, the highest value among the coefficients, but contributes in order to ponder with a negative sign to the main value. Despite the fact exposed, the numerical superiority of the coefficients of the variables hammer weight (W) and hammer drop height (H) remains, which confirms their importance.
It is important to point out that, initially, it has been trained ANN with 8 neurons in the hidden layer and the results were generally satisfactory both in relation to the error and in relation to the correlation. On the other hand, when it has been decreased the number of neurons to 2, the test results were still satisfactory, as shown in the figures. This fact suggests that the approximation of load capacity values is a problem "close to linearity".
When comparing the mean squared errors and correlations, it can be verified that the best result obtained by ANN was when the parameters used were the learning rate μ=0.5000 and nno=8. It is noteworthy that in this simulation correlation was not very high (0.78), but the main objective is the minimization of the mean squared error, and in this case, the study obtained the smallest error (1.4 x 10^6) when the model was performed without including the modulus of elasticity.
When the information about the modulus of elasticity of the pile was added there was an improvement in the results.
The correlation reached 0.96 and the MSE 0.8 x 10^6 when μ=0.9 and nno=4. This fact highlights the importance of knowing in advance the value of the modulus of elasticity for calculation of load capacity.
In summary, it was noticed that the best result was obtained when using the model with 4 neurons in the hidden layer (nno=4), learning rate μ=0.9 and inclusion of the modulus of elasticity in the input parameters. With this topology, a correlation of 0.96 was obtained between the predicted load capacity values and the actual data, in addition the mean quadratic error was the lowest found in all simulations (0.8 x 10^6). It was also verified that the hammer drop height obtained, in general, the highest numerical values of weights (values in the second column of w1).
Pessoa (2018) compared the results of five dynamic formulas (Jambu, Danish, Gates, WSDOT, FHWA) with the values of load tests and found WSDOT results as the best prediction, which presented MSE=3.21x10^6 and correlation 0.84.
These results reinforce that ANN's estimates (especially when including modulus of elasticity, μ=0.9 and nno=4) are significantly better than such formulas, which although admittedly inaccurate are traditionally used in the geotechnical practice.
As the main proposal of this work is to present a computational model based on neural computing, an algorithm is presented in pseudocode ( Table 2) that can be implemented in any programming language, including Excel, in order to enable anyone to use in practice the results presented here.

Conclusion
The major achievement of the work was to get a model which results, considering the values obtained in the metrics of comparison, seem to be very good. In this way, the model shows potential to be adopted as a method of predicting the ultimate load capacity.
As the main objective of this work is to present a computational model based on neural computing, all matrices of synaptic weights were made available, which allows the proposed model to be implemented in any language of programming, so that any user can use it in practice.
Despite the good results obtained by ANN, it is worth mentioning that an important limitation for this type of model was the amount of data available for simulation. We believe that if the database were larger the study could further improve these results. However, the neuronal model presented, even with these limitations, presented encouraging results regarding the prediction of load capacity in foundations.
So, in relation to the results obtained here, future studies should include a refinement of the model, through the expansion of the database used as input values. Also, it would be important to compare the accuracy of the proposed model and that of other prediction methods, such as the empirical and semi-empirical formulas and (mainly) the dynamic formulas (because they are based on the same input parameters).
Besides it, although the results obtained by ANN training are very promising, it is worth investigating, as future studies, simulations using other learning algorithms, such as Momentum and Levemberg Maquart, and comparing them with the Downward Gradient.