Description of health patterns of fully vaccinated older adults hospitalized due to COVID-19 in Brazil through association rules

The coronavirus disease 2019 (COVID-19) is a global public health problem. Since the beginning of the pandemic, notified in March 2020, Brazil has shown high lethality from the disease in older adults. From 2012 to 2018, the country showed an increase of 20% in the older adults’ population. Despite the completeness of vaccine protocols against COVID-19 in the country, there is evidence that this age group, associated with the presence of comorbidities, can be a predictor of the occurrence of hospitalization and severe symptoms due to COVID-19. In this direction, this paper aimed to identify patterns and relationships between symptoms, comorbidities, gender, Intensive Care Unit (ICU) admission, and survival status of older adults, fully vaccinated against COVID-19, hospitalized in Brazil. For this purpose, we perform association rules mining on the OpenDataSUS database. For the group of patients with comorbidity, associations with conditions of oxygen saturation (SpO 2 ) <95%, dyspnea and death were predominant; The female sex was associated with survival and the presence of comorbidities, while the male sex with death and admission to the ICU; for patients admitted to the ICU and who died, associations with SpO 2 <95%, dyspnea, presence of comorbidities and use of ventilatory support were found. The association rule mining procedure has been shown to be useful in surveying the hospitalization profile of these patients.


Introduction
The coronavirus disease 2019 (COVID-19) is a major global public health emergency. It is a severe acute respiratory syndrome caused by the new coronavirus (SARS-CoV-2) first notified in China in December 2019 and recognized as causing a pandemic by the World Health Organization (WHO) on March 11, 2020 (Cash & Patel, 2020). Since its discovery until February 22, 2022, the disease had about 426 million confirmed cases and 5 million deaths worldwide (WHO, 2021). More than 190 countries were affected and those with the highest number of deaths were the United States of America (USA), Brazil, India, Mexico, and Russia. From this group, Brazil, India, and Russia make up the BRICS, a political and economic grouping of developing countries formed by these countries, in addition to China and South Africa. Zhu et al. (2021) pointed out that the number of new daily cases of COVID-19 was aggravated by the situation of social inequality and health vulnerabilities in the BRICS, Brazil being the country with the highest number of deaths.
On February 25, 2020, Brazil registered the first confirmed case of COVID-19 (Candido et al., 2020). Since then, the disease has affected more than 28 million people up to February 22, 2022, in the country, with a higher incidence among adults and higher mortality among the older adults (aged 60 years or over) (Brazil, 2021). From 2012 to 2018, there was a 20% increase in the older adult's population in Brazil (FGV, 2021). Health conditions such as the presence of chronic diseases and hospitalization characteristics and symptoms can influence the occurrence of deaths from COVID-19 in these age groups (Rocha et al., 2021). These conditions can be classified as health characteristics or circumstances that present in a persistent way, requiring fragmented or continuous active responses from health systems (Mendes, 2018).
Faced with the challenge of reducing the risk of death with the listed conditions, the elaboration of a vaccination protocol has become an ethical imperative. Therefore, the world started with the vaccination schemes against COVID-19 in December 2020, but in Brazil this only occurred from January 2021, with emergency approval by the Brazilian Health Regulatory Agency (ANVISA), with the following vaccines: CoronaVac, Pfizer, Janssen and AstraZeneca (Bee et al., 2022). Since then, government records up to December 9, 2021, indicate that Brazil has fully vaccinated (with the two-dose protocol) about 151 million citizens, a total of 75% of its population (Vaccination Brazil Platform, 2022). However, according to Moreno-Perez et al. (2022), countries Research, Society andDevelopment, v. 11, n. 16, e36111637666, 2022 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v11i16.37666 3 should be aware of the clinical characteristics and predictors of poor outcomes in the elderly population, despite the completeness of the vaccination scheme, to verify effective means of protection and management of health services.
Advanced data mining techniques, such as Association Rules Mining (ARM), have been used in the recognition of patterns of these conditions in populations during the COVID-19 pandemic (Shawkat et al., 2021). Data mining can be described as an intelligent method that allows the identification of useful and understandable patterns and relationships of items in a database, which has ARM as one of its oldest fields in recognizing these patterns (Williams, 2011). Association rules procedures are popularly known as Market Basket Analysis, which estimates the probability of products or groups of products being purchased simultaneously. This technique is useful in the analysis of health conditions, as it can identify symptoms, morbidities and other conditions that present themselves simultaneously in a group of individuals (Tandan et al., 2021). Therefore, considering the change in the Brazilian age structure in recent years and the emergence of health research during the COVID-19 pandemic, this study aimed to discover patterns and relationships between symptoms, comorbidities, gender, and deaths of older adults, vaccinated against COVID-19, hospitalized in Brazil. This paper is organized into methodology (section 2), the description of the results (section 3), discussion (section 4) and conclusions (section 5).

Data extraction and population
This is a descriptive study of data from of older adults, vaccinated against COVID-19, hospitalized in Brazil. The data was extracted on November 23, 2021, from the OpenDataSUS on-line platform (2022), including the period from February 14h, 2021 to November 11th, 2021. The choice of the 2021-year database was due to the registration of patients' vaccination data, because vaccination against COVID-19 started in January of that year in Brazil. This database is fed by health professionals through an individual notification form of patients with flu syndrome, with a survey of demographic, symptomatic, vaccination status and morbidity profile information.
Flu syndromes are characterized by symptoms that include fever, headache, chills, sore throat, and cough. Among the potential causative agents are the viruses that cause Influenza, in addition to Adenoviruses and Coronaviruses (SARS-CoV-2).
For confirmation of cases, molecular diagnosis (RT-PCR) or immunological (antibody or antigen) screening is performed. This study included patients with confirmed RT-PCR for SARS-CoV-2 infection, hospitalized, aged 60 years or over, residing in Brazil (see Figure 1). Research, Society and Development, v. 11, n. 16, e36111637666, 2022 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v11i16.37666 In this research, data from 62,180 hospitalized patients with complete vaccination status were used. Subsequently, a cohort of 18,956 patients aged 60 years old or over was carried out for the analysis of association rules. Originally, there were 162 variables in the database, and those containing these characteristics were excluded: (a) administrative data on hospitalization; (c) 90% to 100% of missing; (d) information regarding flu syndromes other than COVID-19; and (e) information that did not characterize symptoms, age group, gender, comorbidities and vaccination and survival status. The final data set consisted of 20 variables containing information on gender, survival, death, symptoms, chest X-ray results, comorbidities, use of ventilatory support, ICU admission information, regions of residence and vaccination status. The symptoms recorded were fever, cough, sore throat, anosmia, ageusia, dyspnea, respiratory decompensation, peripheral capillary oxygen saturation (SpO2), diarrhea, vomiting, fatigue and abdominal pain. Comorbidities include heart disease, neuropathy, lung disease, hematopathy, asthma, diabetes, immunosuppression, nephropathy, and obesity. Brazil has 27 federative units and 5,570 municipalities. Its federative units are divided into regions, such as, North, Northeast, Central-West, Southeast and South. The regions of origin of the patients' residence were also included.
In this study, these variables make up what are called health conditions. The data was converted to a "transaction" format and analyzed using the Apriori Algorithm, available in package "arules" in R (Hahsler et al., 2022).

Missing data
The OpenDataSUS database has variables that contain missing. In the case of comorbidities, symptoms, entry of radiological exams and use of ventilatory support with absent values, we imputed it as absent clinical characteristic for the individuals, as in Baqui et al. (2020a) and Baqui et al. (2020b).

Association Rule Mining (ARM)
ARM is a descriptive method of data mining that allows the identification of patterns through transactions in the frequency of co-occurrences and relationships between items, characteristics or events present in a data set (Jean-Marc, 2001).
Formally, a transaction (T) is composed a set of items, in the rule form: → , where and have no elements in common, that is, they are disjoint ⋂ =∅. Set is called antecedent (left-hand side) and set as consequent (right-hand side). This method also allows to measure the quality of an association rule through metrics, the most common being: (1) support, (2) confidence and (3) lift (Shin, et al., 2018). These values also provide the interpretation of the times the consequent occurs among the set of antecedents.
In the health field, it is important to recognize the patterns in clinical manifestations of a disease, as this can guide the planning and organization of healthcare. Therefore, an example rule is: This rule indicates that 10% of patients reported both dyspnea and death, and the probability of someone dying given they reported dyspnea is equal to 90%. The odds of death is 80% greater when dyspnea occurs.
In this study, rule extractions were performed using the Apriori Algorithm, which allows the identification of the most frequent item sets with establishment of minimum support, making the data more manageable, allowing the discovery of rules and elimination of its redundancies. The property of the Apriori Algorithm can be defined as; ∀ , : ( ⊆ ) → Support( ) ≥ Support( ). So, if an item set is frequent, its subsets are too. On the other hand, if an item set is not frequent, all sets that contain it are not either. Therefore, an item set never exceeds the support of sets that contain it. To exemplify the operation performed by the Apriori Algorithm, consider the scheme shown in Figure 3. Here, a minimum support above 5% was defined. Only the top 10 rules were reported with the highest support and confidence scores.

Ethical aspects
This study was carried out with publicly accessible data, with the guarantee of anonymity in the database available on public platforms in Brazil and did not require approval by an ethics committee. The legal ethical precepts established by
SpO2 below 95% was the most common symptom, followed by dyspnea, cough and respiratory decompensation (see, Figure 4a).
The most predominant comorbidities were heart disease and diabetes, while the least frequent were liver disease and Down syndrome (see, Figure 4b). The population of this study had hospitalization records among the 27 Brazilian states, with a predominance in the Southeast Region of Brazil (See, Figure 4c). It is noteworthy that 35% (n= 6,572) of the patients reported having symptoms that were not listed in the OpenDataSUS record. For patients with comorbidities (n= 14,328), a total of 3,168 rules were found. Among the top 10, we observed that all conditions have a confidence equal to or above 80% (see , Table 1). That is, if a patient reported these conditions, the estimated probability that he had a comorbidity is 80%. The conditions of SpO2<95% and death were the most predominant and, when listed individually or concomitantly with other conditions, such as respiratory decompensation, female gender, dyspnea, and ICU admission, they have a lift equal to 1.07. That is, the odds of an individual having comorbidity is 7% greater when these conditions are present.
Referring to patients without comorbidities (n= 4,628), the Algorithm discovered 1,326 rules. Among the predominant characteristics, there is the absence of radiological records, SpO2<95%, non-admission to the ICU and dyspnea. Although SpO2 < 95% and dyspnea appear as common symptoms in this group, they occur less frequently when compared to the comorbid group. It is also observed that the reports of SpO2<95% and dyspnea, individually or concomitantly, as well as cough are negatively correlated with the consequent, because your lift is less than 1. In other words, when the number of patients who reported the presence of this symptoms increases, the number of patients who reported not having comorbidities decreases. It is noteworthy that the use of non-invasive ventilatory support is independent of the consequent and the condition with the highest lift among those listed is survival. For the female group (n= 9,025), 1,202 rules were obtained. The most predominant conditions reported were survival, no ICU admission and use of non-invasive ventilatory support with confidence equal to or greater than 50% (see, Table 2). These conditions are also listed concomitantly with other critical clinical features, such as SpO2 < 95% and presence of comorbidity, with lift greater than 1. For males (n= 9,931) a total of 1,267 rules were identified. Unlike the female group, in the top 10, for the male group, the conditions of death, fever and ICU admission stood out, along with critical clinical features such as dyspnea and SpO2 < 95%. When death and ICU admission are reported together, the lift equals 1.09. When patients were disaggregated by ICU admission status, 1,277 rules were discovered for the status of admitted (n= 7,098) and 1,803 for the status of not admitted (n= 11,858). Among the rules found for the status of admitted, the predominance of the use of invasive ventilatory support, dyspnea, death, SpO2<95% and presence of comorbidity stands out (see , Table 3).
Clinical conditions that were reported concurrently with the use of invasive ventilatory support had a lift greater than 2. For the non-admission status, we see clinical conditions of use of non-invasive ventilatory support, and the absence of radiological records predominate. Unlike the group of patients who were admitted to the ICU, for which death is predominant in the rules, survival is reported by those not admitted with a confidence of 80%. Furthermore, conditions that usually make up a critical clinical picture of the disease, such as dyspnea and the presence of comorbidity, when reported individually had a lift below 1, indicating that they are negatively correlated with the consequent non-admission to the ICU. A total of 607 rules were discovered for surviving patients (n= 9,909) and, for patients who died (n= 9,047), a total of 288. For the first group, non-admission to the ICU was predominant, followed using non-invasive ventilatory support and the absence of radiological records with confidences above 60% (see, Table 4). The cough symptom, considered mild for the disease, reported together with not being admitted to the ICU presented the highest lift when compared to the other rules listed. Although critical conditions of the disease are also found for this group, such as SpO2<95% and dyspnea, they are reported together with the most predominant characteristics, such as non-admission to the ICU and non-use of invasive ventilatory support.
For the second group, we can observe that ICU admission stands out in the top 10, as well as SpO2<95%, dyspnea, use of invasive ventilatory support and the presence of comorbidity. It is noteworthy that the conditions of admission to the ICU and use of invasive ventilatory support have a confidence of 90% and the highest lift among the other conditions reported. In other words, among the individuals who died, the chances of presenting these conditions are equal to 90%. Regarding lift, the odds of death is 88% greater when these conditions are reported together. It is noteworthy that no significant association rules were found according to the patients' regions of residence and results of radiological examinations.

Discussion
Using the ARM technique, this study identified rules of association for symptoms, presence of comorbidities, genders, ICU admission and survivor status of older adults vaccinated against COVID-19 and hospitalized in Brazil. Regarding symptoms, SpO2<95%, dyspnea and cough were the most frequent. The first two symptoms were reported simultaneously with ICU admission and death in the group of patients with comorbidities. The findings of the most frequent symptoms in this research are similar to those found in other scientific literature (Alimohamadi et5 al., 2020;Fu et al., 2020), and corroborates that of Deng et al. (2020), who collected clinical data from patients in two hospitals in Wuhan, verifying that these symptoms, when reported together with the presence of comorbidity, were also frequent in patients with advanced age, resulting in cardiac and pulmonary complications. Consistent with this, Jacobs et al. (2020), in a prospective cohort carried out in New Jersey with hospitalized patients, identified that the predominance of dyspnea occurs among patients aged 65 to 75 years, with the lowest classification of quality of life, physical and mental health. These studies were conducted before patients were vaccinated, but recent studies such as the one by Alsaffar et al. (2022), carried out with hospitalized patients who received a single dose of vaccines against COVID-19, still showed the older adults had higher mortality rates than the younger group.
According to Andryukov and Besednova (2021), even if the elderly has their vaccination protocol against COVID-19 complete, it is necessary for there to be policies that monitor the effectiveness of the vaccine in this target audience, since the presence of comorbidity and aging cycles can lead to severe disease. Also, according to the same authors, clinical trials with this population are scarce, due to their vulnerability and, in addition, health services do not always aim to measure the effects of vaccines over time. In our study, we also observed that the association rules for patients without comorbidities had lower support and confidence than those for patients with comorbidities, indicating that there is a predominance of symptom patterns and other hospitalization characteristics among patients with comorbidities.
Furthermore, in this paper, SpO2<95% and dyspnea were symptoms that appear simultaneously with female gender in the group of patients with comorbidities. According to Atkins et al. (2020), hospitalized women with COVID-19 may have a higher prevalence of chronic kidney disease and asthma than men, which, added to the deteriorating clinical status, can lead to death. The effect of the pandemic can be further enhanced by considering women vulnerable to poverty and with diminished access to health services (Connor et al., 2020). Some authors such as Mi et al. (2020), Jin et al. (2020), Capuano et al. (2020), Lipsky and Hung (2020) identified that men tend to have higher odds of death than women. Raimondi et al. (2021), in an observational study of COVID-19 patients hospitalized in Bergamo, Italy, pointed out that women may have lower mortality rates than men, but once severe symptoms appear, their probability of dying is like that of men. In this study, the survival characteristic stood out in the top 10 rules for the female group and death for the male group.
SpO2<95%, dyspnea, use of invasive ventilatory support, and the presence of comorbidity are predominant among the association rules for patients admitted to the ICU and those who died, with confidence above 60%. The clinical conditions of symptoms are related to an increased risk of severity and mortality (Elghazaly et al., 2022). When admission to the ICU is reported concomitantly with use of invasive ventilatory support and death, the confidence was equal to 90%. The use of ventilatory support in hospitalized patients presents risks of developing ventilator-associated pneumonia, recurrent in ICU, with risk factors such as the use of endotracheal tubes, nasogastric contaminated respiratory equipment; permanence of decubitus in the supine position; and preference for nasal intubation (Koenig & Truwit, 2006). Chang et al. (2021), in a systematic review and meta-analysis study of COVID-19 and its clinical manifestations in ICU patients, found that the use of ventilatory support, acute kidney injury and acute respiratory distress syndrome (ARDS) are risk factors for mortality. SpO2<95% can be a predictor of in-hospital mortality in patients with COVID-19 and requires healthcare services to focus on reversing hypoxemia to reduce chances of respiratory decompensation (Mejía et al., 2020). Despite vaccination being referred to as the main mean of preventing mortality from COVID-19, the presence of comorbidity remains a co-determining factor of this outcome in older adults (Lv et al., 2021;PrabhuDas et al., 2021). Given this, it is necessary for governments to invest not only in the production of vaccines, but also in mechanisms to combat chronic diseases. The PHC has the potential to solve up to 80% of a population's health problems, as well as reduce the development of chronic and infectious diseases, hospitalization, and mortality rates (Pan American Health Organization, 2021). Passos et al.
(2020) list that chronic diseases prevalent in Brazilians are preventable by PHC services and the main ones are ischemic heart diseases, ischemic and haemorrhagic stroke, chronic obstructive pulmonary disease, and diabetes. All of them are configured as risk factors for mortality from COVID-19 (Nishiga et al., 2020). Misra-Hebert et al. (2021) identified that the pandemic affected users' access to PHC services for monitoring chronic diseases, such as diabetes, as higher levels of glycated hemoglobin were associated with lower chances of health care (in person or virtual) after the beginning of the pandemic.
According to Assis et al. (2021), the regions of Brazil with PHC populational coverage above 75% have lower COVID-19 loads and better early care for symptomatic and comorbid patients, preventing the occurrence of severe cases of the disease.
In view of this, it is necessary to invest in PHC actions and in the expansion of its population coverage for the prevention and control of comorbid older adult patients, combined with the application of vaccines, to reduce mortality from COVID-19 in this group.

Final Considerations
In the present study, it was observed that the main characteristics of hospitalization among older adults vaccinated against COVID-19 show a profile of presence of comorbidities, critical symptoms of COVID-19 (SpO2<95% and dyspnea) and use of ventilatory support. The data mining methodology by association rules, in particular the Apriori Algorithm, proved to be useful in surveying the profile of these patients.
That said, there is a need to visualize the profile of patients so that the SUS can organize itself to meet the constant demographic transition in Brazil, to promote the quality of life of this age group and reduce their hospitalization and mortality rates. Among the limitations of this study, we found the use of secondary data, which have probable underreporting and inputting errors. Nor were there any information regarding the variants of the etiological agent of COVID-19, nor of the laboratories producing the vaccines. Furthermore, it is noteworthy that it is not possible to assert causality between the items found in the association rules. Thus, it is recommended that future studies investigate probable causality between the findings presented here.