Study of the Impact of sanitary decisions over water quality using Bayesian Belief Networks in Upper Pantanal Wetland Basin – Brazil Estudo do Impacto das Decisões Sanitárias sobre a Qualidade da Água Utilizando Redes de Crenças Bayesianas na Bacia do Pantanal do Alto Pantanal – Brasil Estudio del Impacto de las decisiones sanitarias sobre la calidad del agua utilizando Redes de Creencias Bayesianas en la Cuenca del Humedal del Alto Pantanal – Brasil

Bayesian Belief Networks (BBN) modeling the water quality has become popular due to advances in computational techniques. For this instance, BBN is a useful tool to modeling the relationship between water quality data and population or urbanization parameters on a watershed scale. This method can combine primary water quality data and decision parameters and help scientists and decision-makers analyze several scenarios on a watershed, including the effect of scale. This paper aims to analyze and discuss the application of Bayesian Belief Network (BBN) on the relationship between watershed water quality and sanitary management indicators, studying a case on the Pantanal Wetland tributary watershed. Two scales BBN were constructed using ten years of water quality and sewage management datasets. Both BBNs were responsive and sensitive to water quality parameters. The Total Nitrogen and E. coli were de most essential parameters to simulate changes in water quality scenarios. The simulated scenarios showed structural limitations about the Pantanal Wetland Cities' sanitary system in the present study. We strongly recommend a review of the goals of sanitary structure and services and alert to the risk of a sanitary crisis in Pantanal Wetland.


Introduction
Bayesian Belief Networks (BBN) modeling the water quality has become popular due to advances in computational techniques. Several studies demonstrated the importance of developing theses methods with or without another statistical approach (Ancione et. al., 2020;Farooqi et al., 2020;Kang et al., 2020;Mayfield et al., 2019;Panidhapu et al., 2019;Avila et al., 2018).
As another emergent country in the global south, Brazil has many challenges about sanitary and water quality issues.
As discussed by Borrero-Ramírez and Mosquera-Becerra, (2020) the current sanitary crisis on the global south is understood in the context of health systems that have experienced significant transformations in the last decades due to the market-driven actors to influence health policy decisions. A very complex scenario needs to be solved, and a decision can consider several physical, geographical, and policy variables.
Differentially of well-developed nations of the world, scientific decision-based tools are needed urgently in these areas. For this instance, BN is a useful tool to modeling the relationship between water quality data and population or urbanization parameters on a watershed scale (Fasaee et al., 2021;Salman et al., 2021;Forio et al., 2015). This method can combine primary water quality data and decision parameters and help scientists and decision-makers analyze several scenarios on a watershed, including the effect of scale.
Look at closing in Brazil, specifically in Pantanal Wetland, the world's largest wetland; one of the many environmental problems is sewage treatment. Recently, the wildfires call the attention of the world due to severity of the wildflife and extension of the burned area (Pivello et al., 2021). However, we attempt for another emergent disaster: the sanitary crisis. These can occur due to the disruption between policy decisions and scientific indicators, such as water quality index (WQI).
A critical issue to modeling the relationship between the population and the urbanization effect over watershed water quality. In these situations, the decision-makers need a tool to compare different population levels and their impact on water quality goals. This scale effect is particularly significant when considering the practice effect of decision-makers over watershed management. In practical aspects, managers can be using the specific concept of jurisdiction to define the geographic limits of their actions. Several approaches used the hierarchy effect in space and time to understand these (Sha et al., 2014;Zhang et al., 2018;Liu et al., 2019). Accordilig Wan et al. (2014) process-based watershed pollution models have proved useful to simulate complex processes, for example, a Bayesian hierarchical model that investigated the effects of air pollution on health over time. Research, Society andDevelopment, v. 11, n. 3, e21011326309, 2022 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v11i3.26309 3 This paper aims to analyze and discuss the application of Bayesian Belief Network (BBN) on the relationship between watershed water quality and sanitary management indicators, studying a case on the Pantanal Wetland tributary watershed.

Study area characterization
The study was carried out in the Vermelho River basin (15º30'/17º15' S and 53º45'/ 55º00' W), are located in the southeastern region of the State of Mato Grosso, Brazi ( Figure 01). The basin occupies an area of approximately 150,802 ha. (Souza & Loverde-Oliveira, 2014) and is an essential contributor to the Pantanal Wetlands. The main uses and occupations are cattle raising, followed by soybean, corn, and cotton crops, in addition to urbanized areas. The city of Rondonópolis. It has the largest population in the basin with 232,491 inhabitants, with a demographic density of 47.00 inhab./km².

Water quality database
The Vermelho River Water quality Dataset was obtained from governmental water quality databases (Secretaria Estadual de Meio Ambiente -SEMA MT) and the official sanitary management indicators by the national database of sanitation (National System of Sanitation Information -SNIS). Complementary hydraulic data (river discharge) was obtained by the National Water Authority (Agencia Nacional de Águas -ANA). The period selected to study was between 2006 and 2017 due to previous database consistency study (Garcia, et al. 2020). A total of 1778 points of dataset over 14 variables were select to study. Research, Society and Development, v. 11, n. 3, e21011326309, 2022 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v11i3.26309 4 The original dataset variables were paired by supervisoned and manual selection, when inconsistent and no-data cases were excluded from making the cases. Nodes named each select variable, and thus, the Conditional Probability Table (CPT) was created. To evaluate the watershed's scale effect in this study, we created two CPTs from the total basin dataset (BBNw) and only the Rondonopolis city influence area (BBNc).

Conceptual model and basic network algorithm
The essential criteria to define the structure of the network are described in Ramin et al. (2012), Wijesiri et al. (2018), and Panidhapu et al. (2020). In synthesis, we used a supervised conceptual model to design the basic algorithm using in the construction of BNN ( Figure 2). The model was constructed based on previous studies (Silva, et al, 2020) when was demonstrated the effect of seasonality (rain and dry seasons) on the water quality variables. The effect of urbanization and sanitation systems at the river basin scale on the water quality was discussed in another previous study (Garcia, et al. 2020).
The analysis considered the multivariate statistical correlation between the water quality variables on time and space scales. In the synthesis, the major water quality driving forces at Rio Vermelho basis are the seasonality that changes turbidity, suspended solids, and total coliforms. Urbanization and Population growth change majority Nitrogen, Phosphorus, C.O.D.
Colour, and E. coli. Finally, all these variables change de W.Q.I. index.

BBN model construction and validation
A typical mathematical representation of a Bayesian Belief Network is (BBN), can be formulated by BBN= (G, Θ), where BBN G is a directed acyclic graph (DAG), in which its nodes X1, X2, …Xn, represents random variables (nodes) and their links represent direct dependencies between these variables, and Θ represents the set of BBN parameters, P (Xi|Ai) (i = 1, Research, Society and Development, v. 11, n. 3, e21011326309, 2022 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v11i3.26309 5 2, …, n) for each Xi conditioned on the set of Ai (parents of Xi in G). The conditional dependence and P(X) described by Eqquation (1): …………………………………. (1) The goal of these modeling is to create scenarios for simulation when the decision-makers evaluate the impact of the sewage system and urbanization on the water quality. The Bayesian Belief Network (BNN) was constructed using the primary data along 10 years of a dataset. We construct two CPT based on historical datasets to evaluate the basin-scale effect: BBN to whole watershed (BBNw) and BBN to greatest city basin (Rondonópolis City) called BBNc.
In both cases, two entrance variables (node) are the Population With Sanitary System (PWS), which means the absolute value of the people that access adequate wastewater disposal and, Water Consumption (WCS), means the total volume (m 3 ) of water consumption by the population in the study. These entrances nodes were considered the best to the goals because are clear to decision-makers and other stakeholders create executive action plans.
To construct the two BNNs for testing the basin-scale effect, the whole watershed (BBNw), and the greatest city basin (Rondonópolis City) called BBNc, we adjusted the cases and the CPT to the nodes PWS, WCS, and VST. In both cases, we use the validation criteria described below. The other nodes of the river water quality are the same in both scenarios.
The final node is the Water Quality Index (WQI), which means the goal of water quality management. We used the software NETICA (Norsis, version 6.07) to construct and-run the BNN. The primary data was used to construct the Conditional Probabilities Table (CPT) and the discretization criteria into three levels, was conducted by expert analysis. All nodes and their value characteristics are described in Table 1. The validation of BNN modeling was conducted accordingly Marcot et al. (2006). In synthesis, we create de CPT for nodes and discretize using expertise-based criteria. All the nodes appear with their states (height, medium, or low) based on the maximum, medium, and minimum value. Next, using NETICA®, the directed acyclic graph (DAG) was created hen arcs connect the nodes according to the basic conceptual model described in Figure 2. To the network learning, we use the expectation-maximization (EM) algorithm. The WQI node was used as a reference to test with cases, evaluating the prediction accuracy of BNN model by analysis of confusion matrix comparing predicted with actual outcomes. Additionally, the Error Research, Society andDevelopment, v. 11, n. 3, e21011326309, 2022 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v11i3.26309 8 rate (%), Logarithmic loss, Quadratic loss and, Spherical payoff parameters were used as model quality criteria. Finally, by specialist judgment, the DAG was adjusted by ordering the arcs and nodes for the maximum fit of network parameters described above. The final BBN is shown in Figure 3.

BBNs performance and validation
We discuss the results in two ways: the performance of two BNNs and the potential of clearance water quality and sewage management scenarios. As shown in Table 02, the results of the validation parameters for two BNNs can be viewed as a scalar effect. Adjusting the cases from BBNw (126 cases) to BBNc (64 cases) was reduced due to some data that don't cover the watershed area under the influence for Rondonópolis city.
The 3x3 confusion matrix (Table 02) shown the accuracy of both BBN. The overall error rate can be considered acceptable for both BBNs (26,98% and 13,79 %, from BBNw and BBNc, respectively). In the BBNw the Medium state can't predict correctly at 16,6% level. In the BNNc the most important error is Bad state at 12,5%. In both cases, as expected, the good state doesn't show results to predicted, due to the river WQI don't have sufficiently good states data to machine learning.
The WQI of the Vermelho river doesn't have good quality state, as demonstrated by database analysis.
As discussed by Marcot et al. (2006), the confusion matrix is shown how the number of known cases that were correctly classified. This approach is a parameter of the quality of the capture model and can be used to interpret the simulations. In this case, the average of correct classifications of both models is around 79,5%.
This result can be to infer too the effect of the watershed scale on the model accuracy. The BNNc is more accurate due to BNNw, probably due to the effect of Rondonópolis city on the water quality is more pronounced when compared with the whole basin. In other words, the loss of water quality (WQI) can be explained by changes on population parameters (PWS, WC, and VST). In BBNw we consider the particular effect to waste disposal on the loss of accuracy of the model. In Brazilian small cities, as a Vermelho river basin, is common the urban wastewater disposal occurs by septic tanks or open defection, and don't result in a significative change on the surface waters.

Scenario simulations
For water quality simulations, three major scenarios were selected into entrance conditions: Sewage Services, Sewage Structure, and Water consumption om BBNw. The beliefs of WQI index changes were analyzed. Other scenarios were conducted (not shown here). The results are shown in Table 04. In synthesis, high sanitary services and structure reduce beliefs of states Bad and Medium WQI and increase the Good ones. Another hand, high water consumption increases the beliefs of Bad and Medium states of WQI.
These results can be interpreted as a result of the impact of management decisions on water quality. In Brazil the sewage structure is different by their services ones. The structure is a physical extension of the sewage network, and the services are the number of people who use these services. Another result is the changes of WQI is a majority over Bad and Medium states. As described in 2.1 section, the Good state doesn't frequently appear in the data set, and the network learing captures these.
Finally, Best and Worst Scenarios was created combining the conditions of entrances nodes: Best scenario: High Structure and sewage services with Low Water Consumption. Worst Scenario, Low Structure and Sewage Services with High Water consumption. The results show in Table 04 reveals the same tendency as other scenarios, when the best management practices reduce the beliefs over Bad and Medium states of WQI. However, don't increase the Good state of WQI significantly.

Conclusion
Both BBNs were responsive and sensitive to water quality parameters. In two analyzed scales (whole watershed and city influence) Total Nitrogen and E. coli were de most essential parameters to simulate changes in water quality scenarios.
The simulated scenarios showed a structural limitations about the Pantanal Wetland Cities' sanitary system in the present study. Probably, even in the best scenario, the beliefs don't indicate a Good state of WQI of Vermelho River due to limitations of dataset or the insufficient sanitary system structure. We strongly recommend a review of the goals of sanitary structure and services and alert to the risk of a sanitary crisis in Pantanal Wetland.
As a suggestion for future research, it is the application of the methodology in other river basins, with some adaptations of the intrinsic variables. This method is scalable to consider large geographical areas because the same decisions can cover several river basins. However, the gap effect of decisions will need to be considered an essential factor in model adjustments. Some environmental effects do not model by decision-making presupposes because they are not a result of decisions but by an absence of initiatives. This paradox addresses a future scientific discussion.