Reliability of methods using a new graphic template to evaluate alveolar bone graft in cleft lip palate on radiographs

This research aimed to evaluate the reliability of methods using a new graphic template to evaluate alveolar bone graft in cleft lip palate on radiographs. The sample consisted of 30 radiographs of individuals with bone grafts that were analyzed by two raters using SWAG and Chelsea, alveolar bone graft rating methods. The images were analyzed in PowerPoint, and second, introducing a template, that was designed in PowerPoint by the examiners. The inter-reliability and intra-reliability were determined using weighted Kappa statistics, with and without the template, in Jamovi 1.2 software. The determination of the intra-reliability was performed through the random selection of 10 radiographs. Inter-rater reliability in SWAG and Chelsea methods without the template, were moderate (0.574 and 0.519) and with was good (0.745 and 0.735) in both scales. Intra-rater reliability was good (0.710-0.610 and 0.634-0.639) in SWAG and Chelsea methods without the template, and including, this reliability was very good (1 and 0.846) in SWAG scale and good to very good (0.872 and 0.762) in Chelsea method. The use of a template to evaluate the images of alveolar bone grafts in both methods had a positive impact on the results, increasing inter-rater to good and intra-rater reliability to very good.


Introduction
Cleft lip and palate are the most common craniofacial anomalies on the face of humans with a frequency ranging from 1 to every 650/1000 births, varying according to sex gender and ethnicity (Vanderas, 1987). Different phenotypes associated with syndromic conditions are found (Botticelli, et al., 2020). Due to the compromise of aesthetics, phonetics, and nutrition, this malformation influences the quality of life of these patients. The treatment protocols involve surgical procedures such as cheiloplasty, palatoplasty, alveolar bone graft and orthognathic surgery (Burg, et al., 2016;Worley, et al., 2018).
Alveolar bone graft has ideal timing when the cleft canine root is 50% complete (Hynes & Earley, 2003), permitting this teeth eruption in the grafted bone (Calvo, et al., 2014;Dao & Goudy, 2016;Pinheiro, et al., 2020). After bone grafting, the result must be radiographically evaluated to continue the treatment. Ideally, three-dimensional evaluation using Cone-Beam Computed Tomography (CBCT) scans is the option of choice. However, in addition to the difficulties in accessing CBCT scans as a routine in treating cleft lip and palate around the world, there is still no validated method in the literature for using this diagnostic tool (Yu, et al., 2020;Stasiak, et al., 2020).
Despite the limitations in evaluating 2-D radiographs of a 3-D nature of the cleft defect (Yu, et al., 2020), the radiographs are accessible in the treatment routine, inexpensive, easy to perform, and are widely used in comparative studies of the success results of the alveolar bone graft, and universalization of this two-dimensional assessment is part of several multicenter studies (Mahajan, et al., 2017;Russel, et al., 2017).
The alveolar bone graft aims to correct defects in the alveolar cleft. The bone graft promotes stabilization and continuity of the upper arch (Hogan, et al., 2003;Jia, et al., 2006), providing the necessary bone support for the dental elements, recovering facial symmetry (Arangio, et al., 2008;Yu, et al., 2020), and contributes to close oronasal fistulas (Dempf, et al., 2002).Several methods are described in the literature to evaluate the results of the alveolar bone graft in radiographic images (Bergland, et al., 1986;Kindelan, et al., 1997;Hynes & Earley, 2003). The Bergland grading system is 2-D gold standard but is dependent on the complete eruption of permanent canine (Russel, et al., 2017). In order to address this methodological gap, Witherow et al. developed  Given the relevance of bone graft in cleft patients to restore function, aesthetics, and quality of life of these patients, auditing the bone graft results becomes essential. The choice of an evaluation method with reproducibility can influence the decision for continuity of dental treatment. Thus, this study aims to evaluate the reliability of methods using a new graphic template to evaluate alveolar bone graft in cleft lip palate on radiographs.

Methodology
The methodology used in this study was based on the inductive approach, descriptive technique by direct observation, and subsequent statistical procedures (Pereira et al., 2018). This study was approved by the Ethics Committee of the Health Sciences Center, of the University Hospital (13450819.6.0000.5183). A total of 65 radiographic images of patients from the Cleft Lip and Palate Center of Hospital that received alveolar bone graft surgery between 2017 to 2020 performed by two surgeons were evaluated. Bone graft was performed according to Boyne and Sands technique (Boyne & Sands, 1972), and all the cases were with iliac crest bone and varied in fragmented bone and non-fragmented bone.
Radiographs were evaluated to train the evaluators and to determine the parameters to be measured during the evaluation of periapical radiographs, as well as to establish inclusion and exclusion criteria for the sample composition. The exclusion criteria consisted of low-quality images, stained films, radiographs with orthodontic devices, and dental elements overlain with the bone defect. Only good quality images, at least three months after surgery, with the framing of the dental elements adjacent to the defect which allowed the measurements were included in the sample.
Thirty up to 65 radiographs of individuals were included in the sample and were analyzed by two raters using the SWAG and Chelsea scales, alveolar bone graft rating methods. The raters first analyzed the images in PowerPoint, and then introduced a template ( Figure 1). The template was designed in PowerPoint by the examiners according to the reference lines established by the two methods. The template for the SWAG method was designed with the division of the three thirds equally, with the positioning of the first line at the cementoenamel junction and the third line at the root apex. The design for the Chelsea scale presented four divisions, with the same positioning. The template adjustment was performed according to the radiographic image and the cervical and apical limits of the dental elements so that its enlargement or reduction did not affect the measurements of the thirds or quarters. The SWAG scale evaluates the image by dividing the cleft region into three vertical thirds (apical, middle, and coronal) based on the dental element adjacent to the grafted defect area. Each third receives a score from 0 to 6, being 0 when there is no root bone covering, 1 when there is no bone bridge formation, but the roots of the elements are covered with bone, and 2 when there is bone bridge formation. The total score is given by adding the values of each third (Russel, et al., 2017).
In the Chelsea method, the dental element adjacent to the grafted defect is divided into four vertical quarters, with the reference comprising the cementoenamel junction and the apical region. Each element is independently evaluated by dividing the grafted region by a vertical line equidistant to the two elements. Each quarter of an element receives a score, where 0 is assigned when there is no root bone overlay, 0.5 when there is the bone overlay, but does not reach the midline, and 1 when the grafted bone reaches the midline. The total score is given by adding the values of each quarter (Witherow, et al., 2002).
The inter-rater reliability and intra-rater reliability were determined using weighted Kappa statistics (Fleiss & Cohen, 1973), with and without PowerPoint template, in the Jamovi 1.2 software program (The jamovi Project, 2020). Altman (1991) interpretations were used to weight the Kappa statistics to be consistent and comparable with previous investigations (Russel, et al., 2017). The intra-rater reliability was determined through the random selection of 10 radiographs through the website randomizer.org.

Results
The Chelsea and SWAG methods showed a positive correlation, (r= 0,86) as expected (Figure 2). The inter-rater and intra-rater reliability test results (with and without templates) are shown in Table 1. Graphically, the inter-rater reliability in the SWAG and Chelsea methods without the template were moderate and with the template were good in both scales, according to the Altman interpretation of the Kappa values (Altman, 1991) (Figure 3). The intra-rater reliability was good in the SWAG and Chelsea methods without the template, and this reliability including the templates was very good in the SWAG scale and good to very good in the Chelsea method (Figure 4). Research, Society and Development, v. 10, n. 12, e138101220068, 2021 (CC BY 4.   The inter-rater reliability of the studied methods was analyzed in the thirds of the SWAG scale (cervical, middle and apical) and fourths in the Chelsea Method (cervical, second trimester, third trimester, and apical) and is presented in Table 2.
Graphically, the analysis of the thirds on the Swag scale went from good without the template to very good with the template, except for the apical third that remained with good inter-rater reliability ( Figure 5). The inter-rater reliability in the quarters of the Chelsea method, only the second quarter of the Kappa value increased to very good ( Figure 6).

Discussion
An alveolar bone graft represents a relevant outcome to be evaluated in treating individuals born with cleft lip and palate. A method which assesses the quality of this graft should have the potential to detect defects in the bone graft before the eruption of the dental element in the cleft (Witherow, et al., 2002). Of the various existing methods, the SWAG method is based on the presence or absence of root exposure and bone bridge formation, and is described as being simple, practical in the scoring process, viable, validated for mixed and permanent dentition, and with the advantage of locating bone graft deficiency by dividing the cleft into thirds (Russel, et al., 2017). The Chelsea method was developed to assess the position of the grafted bone based on the roots of the dental elements adjacent to the cleft and is also valid for the mixed dentition (Witherow, et al., 2002).
The alveolar bone graft must be performed when the canine has up to two-thirds of the root formed so that it can erupt in the grafted bone (Pinheiro, et al., 2020). Thus, the possibility of evaluating the bone graft in the mixed dentition represented the main reason for choosing these two methods in calibrating the present study.
A positive correlation between the methods was expected since the sum of the scores represented by the presence of bone in the regions is directly proportional to the two scales. Also, it is suggested that if the score in the SWAG method is 1 in a given third, it means that there was no bone bridge formation; consequently, one of the elements for measurement using the Chelsea method will correspond to 0.5 since there can be no bone in the reference midline, as the graft does not have a bone bridge.
The inter-rater reliability of the Chelsea (0.519) and SWAG (0.574) scales in the present study was shown to be moderate. However, the intra-rater reliability was good. This means there is little distinction between the scales about their inter-rater reproducibility. Other studies which evaluated the reproducibility of the studied methods also found a moderate inter-rater reliability, from (0.569-0.681) for the SWAG and (0.50) for the Chelsea scale (Nightingale, et al., 2003). This demonstrates that these methods may have some deficiency in their reproducibility. Although Russell et al. (2017) affirm that the binary response absence or presence of bone facilitates classification in the SWAG method, the reliability results do not confirm this data, and the possible subjectivity in the division of thirds may result in different inter-rater scores, and consequently interfere with the agreement values. The same can be seen in the Chelsea's method. Therefore, both methods present this limitation regarding dividing the areas of the grafted defect and the diverging results between the examiners may be due to this factor.
Another limitation refers to the dental element positioning adjacent to the grafted bone defect since the unevenness makes it difficult to equally divide the cleft region. A possible solution to these limitations was to use a tool capable of facilitating and making the division of regions more objective, offering greater visibility for measuring scores, with a consequent increase in the reproducibility of the scales. Inserting the template in the evaluation of the images reflected in good results of the inter-rater reliability, which went from 0.745 for the SWAG and 0.735 for the Chelsea, as well as in the interrater agreement in the thirds and quarters of the two scales which became good or very good. Lower agreement was observed in the cervical third of the Chelsea method, which has clinical implications for this bone position, since this is an important region for periodontal health (Witherow, et al., 2002).
The two-dimensional evaluation using periapical radiographs presents limitations such as the three-dimensional and irregular shape of the alveolar defect, overlapping, and distortions of the image (De Moura, et al., 2016;Yu, et al., 2020).
However, the use of these radiographs is justified by the low cost, obtainment ease and is part of the professional's clinical routine. In addition, several methods which use radiographs for graft evaluation are described in the literature and have been widely used in this evaluation process (Bergland, et al., 1986;Kindelan, et al., 1997;Hynes & Earley, 2003). Regarding the use of periapical images, although the SWAG scale has been described through occlusal radiographs (Russel, et al., 2017), Nightingale et al. (2003) compared the use of periapical and occlusal radiographs in evaluating postoperative graft and found no significant difference between them.
The three-dimensional evaluation using CBCT is the diagnostic method of choice, especially for volumetrically evaluating the alveolar cleft and measuring bone repair (De Moura, et al., 2016). There is a proposal in the literature for a new scale that uses the CBCT and considers all dimensions of the bone bridge, but still needs validation for its use (Kamperos, et al., 2020). In other words, so far there is no described and validated method which uses computed tomography to assess alveolar bone grafts (Yu, et al., 2020).
Furthermore, most treatment centers for these cleft patients do not have a CT scan available for use, and the high cost does not make CBCT accessible to the assisted population. Therefore, given the reality of the centers and patients, the use of periapical radiographs becomes the only option of choice, and therefore it is necessary to improve assessment of the quality of the grafts using existing methods with the tools available to professionals. Moreover, the use of these scales which use radiographs does not exclude the use of CT scans for planning or treating cases, but they encourage their use for the multicentric evaluation of the results of alveolar bone grafts, as highlighted by the objective description of the SWAG method (Russel, et al., 2017).

Conclusion
The inter-rater reliability of the SWAG and Chelsea methods without the template were moderate, while the intrarater reliability was good. The use of a template to evaluate the images of alveolar bone grafts in both methods had a positive impact on the results, increasing the inter-rater reliability to good and the intra-rater reliability to very good.
Thus, the future development of a digital tool capable of automating the delimitation of the reference measures of the scales with a more precise and objective division could constitute an alternative capable of making the graft evaluation simpler, reproducible, and efficient, making it the most accurate method, in addition to improving their reliability. In addition, making the prognosis of the grafts more assertive for the continuity of the dental treatment of these cleft patients, as well as encouraging the use of methods to assess the quality of grafts routinely in other treatment centers for cleft lip, since this process must be present in routine clinical evaluations.