Artificial Intelligence implemented to recognize patterns of sustainable areas by evaluating the database of socioenvironmental safety restrictions

The several papers recently published, applied to sustainable development, has been considering new methodologies and techniques in identifying the main criteria, in numeric format, that are useful in formulating possible solutions to the solid waste problem. This paper presents the Mathematical and Computational Modeling Process (PM2C), applied in the determination of control variables related to selection of areas destined to the construction of landfills, in order to benefit from new analyzes and values obtained by methods such as AHP (Analytical Hierarchy Process) and GIS (Geographic Information Systems). The main objective of this paper is the use of Artificial Intelligence (AI), through a Decision Tree strategy, as a selective method and optimal solutions in choosing the best area dedicated to the construction of landfills, with the creation and analysis of new values applied to scenarios defined in the paper of Andrade e Barbosa (2015). The results, expressed in analytical and graphical forms, show the individual values for each criterion and new scenarios involved in the phenomena. This paper highlights the importance of incorporating new conditions and criteria to propose a new decision-making rule, simultaneously, associating qualitative and quantitative characteristics, related to social and economic effects, applied to the environment management system. Based on these principles, it was possible to simulate new scenarios that demonstrate, with very high precision, the best values of useful criteria for decision-making in the selection of the optimal area for implementation of a landfill.


Introduction
In the process of maintaining the environment, efficient waste management is essential. To accomplish this, one of the main items to be defined is the location of landfills. The methodologies that determine the location must focus, fundamentally, on the prevention of risk and threats to the environment caused by short-term pollution. It is well known that the waste disposal technique is based on collection, processing, recycling, and final disposal.
Although each country has its particularities in relation to waste production, for Khorram et al (2015), in large parts of cities, waste disposal is done in a basic form of collection and deposited in landfills. Priya et al (2019), state in their paper that, unfortunately, environmental departments have not devoted the necessary attention to the mathematization of this problem, in order to find a sui generis area for the disposal of waste.
Considering the depletion of our planet's non-renewable resources, there is an urgent need to further incorporate technology into sustainable development. It is understood that Artificial Intelligence is one of these fundamental technological modalities for this socio-environmental management.
Education is one of the fields in which the transformational potential of computing is still not well recognized. Although Bio-inspired Computing and Artificial Intelligence can be educationally attractive, there is no relevant interest in using software in teaching in order to improve the effectiveness of learning and research (Mayer, 2019).
Landfill screening is an extremely important chapter in the urban planning process, the implementation has direct impacts on the social-environmental health, ecology, and economy of the region.
Maps and geological data should be used in order to locate flaws where the structure of the crust in the region is weak.
Soil maps, road maps and other environmental data sets should also be considered in the location of a safe and ecologically correct waste disposal area.
An optimal region of waste disposal should consider several characteristics in its implementation (Hayeri et al., 2019).
To prevent water pollution and groundwater, threatening the ecosystem, these areas should be away from places with flooding and groundwater record.
In Brazil, despite the Nacional Solid Waste Policy demanding the end of dumps through the country, in the state of Pará, the landfill in the municipality of Marituba receives solid waste from the capital Belém, from the city of Ananindeua, from Marituba itself, from Benevides and Santa Barbara. These cities are part of the Metropolitan Region of Belém (RMB), and together they collect around 40 thousand tons per day (Brito et al, 2020).
The Marituba landfill is still in operation. Since the opening, residents of Marituba and adjacent cities have complained about the stench that invades the streets, houses and establishments. The city of Belém and its metropolitan region ate still searching for a solution for solid waste management. One of the solutions will be presented in this paper.
Decision methods aim to satisfy one or multiple objectives and are developed based on the evaluation of one or more criteria. According to Costa et al. (2020), the location of the landfill is a multi-criteria process, which considers several attributes and implies the evaluation and selection of suitable areas, based on pre-defined criteria.
For Swacha et al. (2021), there are two basic ways to integrate sustainability and education in Computing: one is by introducing new courses in the computational area whose topics across the two areas; the other is the implementation of projects and research with the theme of sustainability in classic courses, such as Computer Engineering. The second way seems more suitable. Courses whose disciplines are only indirectly linked to sustainable development issues, such as Algorithms and Data Structure or Introduction to Computer Programming, can also be seen as a solution to a recent trend of removing sustainable production proposals from the main contents of study. This paper presents the Mathematical and Computational Modeling Process (PM2C) applied in the determination of control variables related to the selection of areas for the construction of landfills. To this end, the methodology that involves technological knowledge linked to scientific knowledge was use. According to Chalmers (1999) and Crump (2002), Mathematics, Science, Engineering and Technology coexist in an evolutionary structure, proposing consistent explanations and predictions, through systematic experimental results.

Theoretical Framework
A choice between alternatives characterized a decision, which can represent different information or hypotheses about an area. The criteria serve as norms to find the best alternatives and represent possible conditions to quantify or evaluate, contributing to decision making (ABNT, 1997). Souto (2009) describes sanitary landfills as the most viable form of final disposal of urban solid waste in Brazil, both technically and economically. However, deterring the location of landfills is a difficult and complex process, as multiple criteria must be combined to do so. Sener et al (2011), performed the selection of suitable sites for the implementation of landfills in the catchment area of Lake Beyşehir, Turkey, using GIS and multi-criteria analysis. The survey determined eight important criteria to be considered when selecting these sites. These are distance from settlements, distance from surface water, distance from protected areas (Ecological, Scientific or Historical), Geology/Hydrogeology, land usage distance from roads, slopes and exposure.
Regarding the implementation of sanitary landfills, Portella and Ribeiro (2014) highlighted that the advantages are great, as they enable and adequate disposal of waste according to engineering and environmental control standards; high daily absorption of waste generated; they offer all the conditions for biological decomposition of organic matter contained in household or domestic waste and provide treatment for the slurry generated by the decomposition of organic matter and rainfall. Research, Society andDevelopment, v. 10, n. 10, e212101018841, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i10.18841 4 As pointed by Moreira et al. (2016), the use of sanitary landfills is the most common method of disposal of Urban Solid Waste (USW) in Brazil. Costa et al. (2016) propose a safety region, based on Decision Tree, for the integrated dispatch of the natural gas thermoelectric power generation system. Based on the values obtained by the Decision Tree, it was possible to optimize the integrated dispatch of this system, substantially reducing the environmental impact on the ecosystem. Kumar et al. (2017), describes that the attributes that define the area for a landfill are sets of objectives and criteria.
Were the criteria being effective factors of the procedure and are operational parameters that can be scored and weighted. The evaluation of attributes provides data and information necessary to estimate alternatives to the construction of the landfill.
According to Mu et al. (2017), Decision Trees are one of the most effective and widely used techniques in many areas, such as Data Mining, Machine Learning, Image Processing and Fault Detection. According to the authors, Decision Tree has become popular not only for its high precision and need fewer parameters, but also because of its better understandability of classification rules extracted from resource-based examples, which is a very attractive property in the context of Data Mining Wu et al. (2018), indicate in their paper, the use of Multicriteria Iterative Decision Making. According to the authors, this method is based on the Perspective Theory and considers the behavior of processes in relation to factors by which the are affected. Pinheiro et al. (2019), characterized areas restricted to the implementation of landfills in the Pontal of Paranapanema -SP region using multi-criteria approach, applying geoprocessing tools, supervised classification, and Boolean logic. The restrictive criteria used were drainage network, water body aerodromes and Conservation Units (UC). Sodre et al. (2020), proposed a methodology with the objective of evaluating the territory of the city of Castanhal in the state of Pará and selecting the area that best fits the current federal norms of urban waste management and sustainability. The paper highlights the fact that the city maintains an open-air dump, without any treatment, in addition to being inadequate for being close to rivers, flooding regions and houses. By using the same criteria that inappropriate the landfill region, with the aid of data from geographic information system and remote sensing, the ideal location for installing the landfill project was selected through land analysis. Costa et al. (2021) present a Mathematical-Computational model capable of minimizing the operational costs of a multiobjective function, of thermoelectric generation, proposing the replacement of diesel oil (more pollutant) by a natural gas (80% less pollutant than diesel oil). Natural gas and electricity networks are modeled by two groups of non-linear equations and are solved by the combination of a hybrid system that applies Newton's method associated with an Artificial Intelligence strategy, called Genetic Algorithm.

Methodology
The methodology adopted in this paper is predominantly structuralist in accordance with Pereira et al (2018). The research is stablished with the investigation of a factual event. Then amplifies to the abstraction of the plan by the design of mathematical and Computational models, described by Costa et al. (2020). Computational modeling was performed with the use of RapidMiner Studio version 9.9, registered with the Educational Edition. The computer used to perform the simulations used the Windows 10 Pro Edition Version 21H1 operating system and the following specifications: CPU Ryzen 5 3600, GPU RX 580 8GB with 16GB of DDR4 memory and 480GB SDD. The scientific purpose is represented and, finally, the result of the investigation is shown, linked to a priori information about reality, idealized and correlated with social, environmental, and economic conditions and restrictions. Research, Society andDevelopment, v. 10, n. 10, e212101018841, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i10.18841

Bio-inspired Computing
Bio-inspired Computing represents a set of different studies in Computer Science, Biology and Mathematics, in addition to being a field of study of connectionism and social behavior. The Bio-inspired Computational Optimization Algorithms approach is based on the principle of biological evolution of nature to develop more robust computational techniques. In recent years, Bio-inspired Optimization Algorithms have been used in Machine Learning to address optimal solutions in solving complex problems in science and engineering.
The way that Bio-inspired Computing differs from Artificial Intelligence (AI) is the implementation of an evolutionary approach to the realization of learning, contrary to the creationist methodology of the AI. (Malladi;Shyamala, 2015).
This method can be applied from information processing, decision making to optimization algorithm. The Computational Intelligence techniques have expanded to several areas, in a way that, in the last decades, new methods and algorithms have been developed for the most different fields and applications, for example: Genetic Algorithms, Artificial Neural Networks, Evolutionary Algorithm and Fuzzy Logic.
Due to the popularization and expansion of the technology, it is expected that in the coming years, intelligent optimization algorithms will be increasingly effective in solving problems in different areas, such as: Engineering, Medicine, Space and many others.
Inspired by the working of human memory, Krestinskaya and James (2016), proposed a new Bio-inspired Algorithm to store Hierarchical Temporal Memory (HTM) resources detected in images. The proposed algorithm was tested with easy recognition using AR face database data. The simulation results showed that the proposed algorithm offers greater precision in facial recognition when compared to conventional methods.
With the beginning of the applications of Bio-inspired and Biomimetic strategies in the development of neural probes, Yang et al. (2019) presents in their article a Bio-inspired project for Neuron-Like Electronic Neural Probers (NeuE), where the main building blocks mimic the subcellular structural characteristics and mechanical properties of neurons.

Artificial Intelligence
Artificial Intelligence is a branch of science that seeks, through various technologies, to simulate processes in Nature aimed at solving problems. It can be found in different ways such as in technical infrastructure, processes or in a product for end users. The profound changes brought about AI in modern society are already evident in the way we live and work.
With the evolution of Computing, Artificial Intelligence has been gaining more space, as its development promoted a great advance in computational analysis, making possible the creation of technologies such as Augmented Reality, Neural Language Processing, Machine Learning, Speech Recognition, among others that allow its use in companies from different segments, to support smarter decision-making (Lu et al. 2017). Vaisy et al. (2020) identified seven significant AI applications for the COVID-19 pandemic. This shows that this type of technology plays and important role in detecting groups of cases and predicting where the virus will affect in the future., through the collection and analysis of previous data.

Decision Tree
The concept used in this paper is the Decision Tree. It is a predictive statical model of supervised learning used for data classification and prediction. For Garcia (2004, p.34), Decision Tree are a simple and effective way to represent knowledge.
They are based on the divide-and-conquer approach, this means that, on the successive division of the set of examples used for training, into several subsets, until each of these subsets belongs to the same class, or until one of the classes is the majority, with no need for new divisions.
As pointed out by Crepaldi et al. (2010), the main advantage of Decision Tree is decision-making considering the most relevant attributes, as well being understandable for most people. By choosing and presenting the attributes in order of importance, Decision Tree allow users to know which factors are the most influential. Hasan et al. (2018) used Decision Tree as a mean of predicting student performance and helping those involved in evaluating the teaching process of E-Commerce. Technologies module. In the paper, the performance of 8 Decision Trees algorithms was evaluated with a database of 22 students, the database was composed of: Academic data of each student and time spent on the Moodle online platform. Sathiyanarayanan et al. (2019) used supervised machine learning and Decision Tree to identify breast cancer. In addition to the Decision Tree (DT) algorithm, the K-nearest Neighbors Algorithm (KNN) method was also used for precision comparison.
The results reveal that despite the 97% accuracy of the KNN method, the 99% maximum accuracy of the DT is much more reliable when applied together with the Supervised Machine Learning method to predict the presence of cancer. Tree is a way to improve decision-making in clinical practices when using larges datasets. Ramadhan et al. (2020), performed a comparative analysis of accuracy between the K-nearest Neighbors (KNN) algorithm and Decision Tree (DT) algorithm in the detection of DDoS attacks. Using the CICIDS2017 dataset, it was possible to verify that even with KNN's 98.94% accuracy, the DT method had 99.91% of accuracy, showing which is the best method in detecting DDoS attacks.

Decision Matrix
Based on the Criteria comparison matrix, defined by Andrade and Barbosa (2015), using the Analytic Hierarchy Process (AHP) method and multi-criteria analysis to determine the attributes for each analyzed variable, Costa et al. (2020) created a non-linear mathematical model, which adjusted the values of the variables xi, previously determined by the AHP methodology. Table 1 presents the values used in the mathematical modeling by Costa et al. (2020). This table was based on the matrix of Andrade and Barbosa (2015). According to Costa et al. (2020), "each variable xi assumes values related to the relevance level of the criterion calculated via AHP. The criteria are classified into geographic areas and environmental factors. These studied areas were called scenarios". For the analytical and graphical presentation of the elements, Costa et al (2020), used the Nonlinear Multiple Regression Method (MRMNL), in which yi corresponds to the dependent variables and xi to the independent variables. The independent variables were named and organized to Table 2.

Mathematical Modeling
The Decision Matrix, defined by Andrade and Barbosa (2015), contains 11 scenarios and each is described by 11 variables. For the induction of the decision tree, this number of scenarios proved to be insufficient, so the generation of new scenarios. Therefore, each study case adopted a criterion (Arithmetic Mean, Geometric Mean and Standard Deviation) to create new scenarios.

Case Study A:
Case study A used the simple Arithmetic Mean as the base criterion to produce new scenarios and split validation as the validation method.
The Arithmetic Mean can be obtained by dividing the sum of all the values of a numerical set by the total number of elements in this set, according to equation 1.
where, The first two tests included the insertion of 13 new scenarios (12 to 24), seeking to match the quantities for each skill, totaling 24, as shown in Table 3. The new values were determined based on already existing ones in the work of Andrade and Barbosa (2015). These values were increased or decreased up to 0.2, evenly distributed between the variables, always taking care to maintain the value of the Arithmetic Mean related to the skill of the scenario chosen as the base.
Test 3A: Unlike Tests 1A and 2A, which have 24 scenarios, seeking to match the amount of skill between them, Test 3A now has 22 scenarios, with 11 base scenarios (Andrade and Barbosa, 2015) and 11 new ones based on these, as shown in Table 4. Research, Society and Development, v. 10, n. 10, e212101018841, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i10.18841 The new scenarios have the same amount of skill as the base scenarios, with 4 HIGH, 4 MEDIUM and 3 LOW. The construction of Table 4 is based on Table 3, differing only in the LOW skill scenarios, these being 3 instead of 5. The LOW skill scenarios in Table 3 used in Table 4 are 12, 13 and 14.
The LOW skill scenarios in Table 3 used in Table 5 are 12, 13 and 16.
Test 5A: Just like Test 4A, Test 5A has 22 scenarios, with 11 base scenarios (Andrade and Barbosa, 2015) and 11 new ones based on these, as shown in Table 6. The construction of Table 6 is based on Table 3, differing only in the LOW skill scenarios, these being 3 instead of 5.
The LOW skill scenarios in Table 3 used in Table 6 are 13, 14 and 15.

Case Study B:
Case study B adopted the use of Geometric Mean as a base criterion to produce new scenarios. The Geometric Mean was calculated according to equation 2.
where, ̅ → represents the value of the Geometric Mean.
Test 1B: The first test included the insertion of 13 new scenarios (12 to 24), totaling 24, as shown in Table 7. Of the new data, 11 were created by adding or subtracting 0.2 of each variable from the scenarios in Table 7, maintaining the Geometric Mean value. The remaining 2 scenarios were inserted to balance the number of cases with LOW aptitude and for this, 11 values were chosen, used in the 22 previous scenarios, which resulted in one of the Geometric Averages of LOW aptitude.
Test 2B: Test 2B had a similar table to Test 1B, containing 24 scenarios, as shown in Table 8. Of the new data, 11 were from Table 1 and 13 were new scenarios, created to enrich the database. The only difference is in the constant chosen to add or subtract the variables, which went from 0.2 to 0.4.

Case Study C:
According to Martins (2013), "Standard Deviation of a sample (or collection) of data, of a quantitative type, is a measure of data dispersion relative to the mean, which is obtained by taking the square root of the sample variance".
The disseminations of observations that make up a sample can be characterized by the deviations of each observation in relation to the mean ( − ), which can take positive or negative values, and the sum of the deviations of each observation in relation to the sample mean is zero.
Based on this information, case study C used equation 3 to calculate the standard deviation and use it as a base criterion for producing new scenarios. Cross Validation and Split Validation were adopted as the validation methods. The calculation of Standard Deviation is exemplified in the equation below.
where, → represents the Stardard Deviation value of the data involved.
Test 1C: The first test of the case study C has the addition of 22 new scenarios (12 to 33), based on the Standard Deviation values from the original table of 11 values defined by Andrade and Barbosa (2015), as shown in Table 9. The 2C test had 33 new scenarios (12 to 44), as shown in Table 10.  Table 11.

Computational Modeling
Computational Modeling and Simulation is a fundamental technique for the study. With the purpose of delimiting ideal values and variables that configure a phenomenon. Paula et al. (2020), used the software, FLUENT 14, in Computational Simulations for soy particle flux form simulated analyzes. Making it possible to identify parameters to improve structures used in storage units of agricultural products.
GEOSLOPE, was the software used by Magalhães et al. (2020) in the investigation of landfill stability determined through a global analysis, using the Bishop method.
equations, adjust the data of a surface and find a function that would model the variables involved and optimize the solution of the Criteria Matix. The choice of language was made because it is a high-performance language, duly endorsed scientifically validated and presented all the necessary tools to carry out its research.
The software, RapidMiner, was chosen to carry out the research proposed in this paper, as it provides a wide range of statistical evaluation methods, such as correlation analysis for regression, classification, and clustering procedures, as well as parameter optimization. Such methods can be used in different applications and data types, such as text, images, audio, and time series analysis. The analyzes can be fully automated and the result viewed in different ways.

Decision Tree Implementation
For the Decision Tree construction, five operators were used, according to the following specifications:

a)
Read Excel -This operator can be used to load data from Microsoft Excel spreadsheets.

b)
Split Validation -This operator performs a simple validation i.e., randomly splits up the ExampleSet into a training set and test set and evaluates the model. This operator performs a split validation in order to estimate the performance of a learning operator (usually on unseen data sets). It is mainly used to estimate how accurately a model (learnt by a particular learning operator) will perform in practice.

c)
Cross Validation -It is mainly used to estimate how accurately a model (learned by a particular learning Operator) will perform in practice. It has two subprocesses: a Training subprocess and a Testing subprocess. The input ExampleSet is partitioned into k subsets of equal size. Of the k subsets, a single subset is retained as the test data set (i.e., input of the Testing subprocess). The remaining k -1 subsets are used as training data set (i.e., input of the Training subprocess).
The cross-validation process is then repeated k times, with each of the k subsets used exactly once as the test data.

d)
Decision Tree -This Operator generates a decision tree model, which can be used for classification and regression.

e)
Apply Model -A model is first trained on an ExampleSet by another Operator, which is often a learning algorithm.
Afterwards, this model can be applied on another ExampleSet. Usually, the goal is to get a prediction on unseen data or to transform data by applying a preprocessing model.

f)
Performance -This operator is used for statistical performance evaluation of classification tasks. This operator delivers a list of performance criteria values of the classification task. Research, Society andDevelopment, v. 10, n. 10, e212101018841, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i10.18841 20 The following figures represents the programming of the operators mentioned above, as well as their connections. These connections are in accordance with the methodology used for this DT, specifically.

Results and Discussion
Case 1A: The parameters of the operators used in this case are in their default format. After executing the process, with the data presented in Table 3, we obtained the DT presented in Figure 5 and Table 12. According to Figure 5, Rules 1 and 2 define the control variables x4, x1 and x6, in descending order of their degrees of importance. The HIGH prediction skill is highlighted in the figure, and its rules are described, respectively, by expressions (4) and (5) and 50% respectively, resulting in an accuracy of 83,33%.

Case 2A:
In order to improve the accuracy of the results obtained through Table 3, the parameters of the operators used in this case are in their default format, with the exception of the Split Ratio, which was changed from its default 0.7 to 0.6. After executing the process, with the data presented in Table 3, we obtained the DT presented in Figure 6 and Table 13. According to Figure 6, the DT has the same rules presented by the DT in Figure 5, as well as their expressions. The Performance Vector in Table 13 shows a Confusion Matrix, this table helps understand the correct and incorrect prediction made by the algorithm, and shows the accuracy, individual class precision and recall. According to Table 13, we obtained for HIGH, MEDIUM and LOW skill, a class precision of 75,00%, 100% and 100% respectively, and a class recall of 100%, 100% and 66,67% respectively, resulting in an accuracy of 88,89%.

Case 3A:
The parameters of the operators used in this case are in their default format. After executing the process, with the data presented in Table 4, we obtained the DT presented in Figure 7 and Table 14. According to Figure 7, Rules 1 and 2 define the control variables x4 and x1 in descending order of their degrees of importance. The HIGH prediction skill is highlighted in the figure, and its rules are described, respectively, by expressions (6) and (7).

Case 4A:
The parameters of the operators used in this case are in their default format. After executing the process, with the data presented in Table 5, we obtained the DT presented in Figure 8 and Table 15. According to Figure 8, Rules 1, 2 and 3 define the control variables x6, x9, x2, x1 and x4 in descending order of their degrees of importance. The HIGH prediction skill is highlighted in the figure, and its rules are described, respectively, by expressions (8), (9) and (10).  Source: Authors (2021). Table 15 shows the Performance Vector, this table helps understand the correct and incorrect prediction made by the algorithm, and shows the accuracy, individual class precision and recall. According to Table 15, we obtained for HIGH, MEDIUM and LOW skill, a class precision of 25,00%, 100% and 0,00% respectively, and a class recall of 50%, 50% and 0,00% respectively, resulting in an accuracy of only 33,33%.

Case 1B:
The parameters of the operators used in this case are in their default format. After executing the process, with the data presented in Table 7, we obtained the DT presented in Figure 10 and Table 17. According to Figure 10, Rules 1 and 2 define the control variables x4 and x1 in descending order of their degrees of importance. The HIGH prediction skill is highlighted in the figure, and its rules are described, respectively, by expressions (14) and (15).

Case 2B:
The parameters of the operators used in this case are in their default format. After executing the process, with the data presented in Table 8, we obtained the DT presented in Figure 11 and Table 18. According to Figure 11, Rules 1 and 2 define the control variables x4, x9 and x1 in descending order of their degrees of importance. The HIGH prediction skill is highlighted in the figure, and its rules are described, respectively, by expressions (16) and (17).
4 ≤ 1.500 → 9 ≤ 1.700 → 1 ≤ 1.500 (17) The Performance Vector in Table 18 shows a Confusion Matrix, this table helps understand the correct and incorrect prediction made by the algorithm, and shows the accuracy, individual class precision and recall. According to Table 18, we obtained for HIGH, MEDIUM and LOW skills, a class precision of 100%, 100% and 100,00% respectively, and a class recall of 50%, 100% and 100% respectively, resulting in an accuracy of 83,33%.

Case 1C:
The parameters of the operators used in this case are in their default format, except for the Split Ratio, which used a value of 0.8. After executing the process, with the data presented in Table 9, we obtained the DT presented in Figure 12 and Table 19 and 20. According to Figure 12, which is the same in the results of both validations, Rules 1 and 2 define the control variables x4, x1 and x6 in descending order of their degrees of importance. The HIGH prediction skill is highlighted in the figure, and its rules are described, respectively, by expressions (13) and (14).   It is interesting to note that the Performance Vector using the Cross Validation has more prediction samples than the Split Validation, this is due to the partitioned into k subsets of equal size used by the Validation method.

Case 2C:
The parameters of the operators used in this case are in their default format. After executing the process, with the data presented in Table 10, we obtained the DT presented in Figure 13 and Table 21 and 22. According to Figure 13, which is the same in the results of both validations, Rules 1 and 2 define the control variables x4, x1 and x2 in descending order of their degrees of importance. The HIGH prediction skill is highlighted in the figure, and its rules are described, respectively, by expressions (20) and (21).  Source: Authors (2021).
The Performance Vector in Table 21 shows a Confusion Matrix, this table helps understand the correct and incorrect prediction made by the algorithm, and shows the accuracy, individual class precision and recall. According to Table 21, we obtained for HIGH, MEDIUM and LOW skills, a class precision of 100%, 100% and 100% respectively, and a class recall of 50%, 100% and 100% respectively, resulting in an accuracy of 100%. In Case 2C, the only parameter value modified from the Cross Validation pattern was the number of folds, changed from 10 to 11. It is possible to observe that Table 22 has more prediction values in all 3 aptitudes, when compared to Table 21.
The explanation for this fact is given by the method of how this validation technique partitions the data between training and testing values.
The Performance Vector in Table 22 shows a Confusion Matrix with Cross Validation, this table helps understand the correct and incorrect prediction made by the algorithm, and shows the accuracy, individual class precision and recall. According to Table 22, we obtained for HIGH, MEDIUM and LOW skills, a class precision of 100%, 94,12% and 100% respectively, and a class recall of 93,75%, 100% and 100% respectively, resulting in an accuracy of 97,63% +/-7,54% and Micro Average of 97,63%. It is interesting to note that the Performance Vector using the Cross Validation has more prediction samples than the Split Validation, this is due to the partitioned into k subsets of equal size used by the Validation method. To achieve the values in the table above, a single value of the default Cross Validation parameters was modified, the number of folds was changed from 10 to 14. It is possible to observe that Table 24 has more prediction values in all 3 skill, when compared to Table 23. The explanation for this fact is given by the method of how this validation technique partitions the data for the application of its sub-processes.
The Performance Vector in Table 24 shows a Confusion Matrix with Cross Validation, this table helps understand the correct and incorrect prediction made by the algorithm, and shows the accuracy, individual class precision and recall. According to Table 24, we obtained for HIGH, MEDIUM and LOW skills, a class precision of 87,50%, 88,89% and 100% respectively, and a class recall of 87,50%, 100% and 83,33% respectively, resulting in an accuracy of 92,86% +/-18,16% and Micro Average of 90,91%. It is interesting to note that the Performance Vector using the Cross Validation has more prediction samples than the Split Validation, this is due to the partitioned into k subsets of equal size used by the Validation method.

Conclusion
The initial impulse for this work came from the current need to assist managers in choosing suitable location to build landfills, as this process involves the analysis of various characteristics of the region. This screening is extremely important to minimize social, environmental and economic impacts in regions that surrounds landfills.
Based on the characteristics of the problem, we chose to work with decision trees, a supervised learning technique for classification, as it is a simple method that builds a predictive structure considering the most relevant attributes for the problem.
All decision trees showed in this article were implemented by using RapidMiner, a data science and artificial intelligence software. As it is a properly validated tool in the business and scientific environment, which presents a collection of algorithms which can be easily implemented and remodeled through block programming, several tests could be performed considering different parameters.
This article also showed the use of arithmetic mean, geometric mean and standard deviation to create extra scenarios for the database, since the initial amount of data proved to be insufficient to work with decision trees.
The results obtained with several tests, considering different numbers of scenarios and parameters in the computational modeling, showed promising results with excellent accuracy and precision. Validating, from a theoretical and statistical point of view, the use of the decision tree to assist in choosing a suitable place for the construction of landfills.
The pursuit of this work is aimed at the application of other Bioinspired techniques. It is expected that, with the implementation of Unsupervised machine learning, such as Clustering and Autoencoders, make it possible to identify other intrinsic characteristics in the source scenarios. Once clustering joins the scenarios according to their similar characteristics, disregarding labeling, and autoencoders try to replicate the input data, perhaps enabling the creation of new scenarios due to the noise present in the process of meeting and decoding of this method.