Forensic DNA Phenotyping: starting point to prediction model in Pernambuco population, Brazil

The study of Externally Visible Characteristics (EVC) of pigmentation associated with SNPs (Single Nucleotide Polymorphisms) has become a target in the forensic field due to the possibility of phenotypically characterizing an individual. In Brazil, there are few data that shows the evaluation of some these markers, so further studies are necessary to understand better the pigmentation process related to genetic markers. The aim of this study was to test the association between 8 SNPs present in HIrisplex tool and EVC to provide a starting point for the development of prediction models for heterogeneous populations like the one in Pernambuco. Were evaluated 176 individuals by associations between self-reported eye, hair and skin color data and polymorphisms. Artificial intelligence tools were used for the prediction models. Significant associations were found between rs1800404 (OCA2), rs6058017 (ASIP), Research, Society and Development, v. 10, n. 13, e262101320955, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i13.20955 2 rs16891982 (SLC45A2) and rs1426654 (SLC24A5) with (EVC). The prediction models evaluated showed satisfactory prediction rates, rates above 60% for skin color and above 70% for eyes and hair. The associations found in our data show the importance of SNPs evaluation used in DNA Phenotyping, because of its ability to provide new information in the context of criminal investigations. Our data indicate that is possible to use molecular information to predict phenotypes in miscigenated populations, like the Brazilian population. These polymorphisms could be possible phenotypic predictors for the Pernambuco population.

rs16891982 (SLC45A2) and rs1426654 (SLC24A5) with (EVC). The prediction models evaluated showed satisfactory prediction rates, rates above 60% for skin color and above 70% for eyes and hair. The associations found in our data show the importance of SNPs evaluation used in DNA Phenotyping, because of its ability to provide new information in the context of criminal investigations. Our data indicate that is possible to use molecular information to predict phenotypes in miscigenated populations, like the Brazilian population. These polymorphisms could be possible phenotypic predictors for the Pernambuco population. Keywords: Pigmentation genes; Miscigenated Population; Brazil; SNPs; Phenotypic prediction; Artificial intelligence.

Introduction
Forensic DNA Phenotyping (FDP) is the prediction of appearance traits from DNA present in biological samples (Queirós, 2019). The first studies with FDP began in the 2000s (Koops & Schellekens, 2006), This technology can be applied in criminal investigation scenarios, such as identifying missing persons, when DNA samples collected at crime scenes do not match the profiles stored in forensic DNA databases and there are no other leads for investigation or available eyewitnesses. This technique can be used also to identifying people in mass disasters (Queirós, 2019).
SLC24A5 gene (Solute carrier family 24, member 5) is located on the long arm of chromosome 15, encoding the NCKX5 protein (sodium / calcium / potassium exchanger 5), described as a solute transportable protein involved in melanosome maturation, melanin produced type and control of melanosomal pH (Lamason, 2005). The SNP rs1426654 is located in coding region of the SLC24A5 gene, characterized by a substitution of guanine for adenine in exon 3, this exchange results in amino acid change, a substitution of alanine by threonine, resulting in decreased of the ion exchange scale and pheomelanine synthesis (Cook et al., 2009;Jackson, 2006;Lamason, 2005).
MC1R (Melacortin-1 Receptor) is the best characterized among the genes that regulate pigmentation in humans. It is located on chromosome 16, composed of a single exon. It encodes a receptor protein located on cell surface of melanocytes and it plays an important role in melanogenesis. (Makova & Norton, 2005). The polymorphism rs885479 is characterized by a substitution of guanine by adenine, causing an amino acid exchange of arginine for glycine, also known as variant R163Q (Fernandez et al., 2007).
The OCA2 gene encodes a P protein, with 12 transmembrane domains. It is believed to be involved in anion transport, regulation of melanosomal pH and also involved in the processing and transport of internal proteins such as tyrosinase and tyrosinase-associated proteins. (Bellono, Escobar, Lefkovith, Marks, & Oancea, 2014). The rs1800407 polymorphism is located at exon 13 in OCA2 gene at amino acid position 419, the allele A mutant encodes the amino acid glutamine instead of arginine, is mainly associated with iris phenotypic variation (Jannot et al., 2005;Rebbeck et al., 2002). Rs1800404 is located at exon 10 in OCA2 gene and mutant allele A encodes a synonym substitution, this polymorphism is most commonly related to skin pigmentation (Adhikari et al., 2019;Crawford et al., 2017).
The TYR gene encodes tyrosinase, a key enzyme in the melanogenesis process, catalyzes the three most important steps in melanin production, the oxidation of dopamine tyrosine, dopaquinone dopamine, and 5,6-dihydroxyindyl to 5,6indolequinone. Melanin production and yours yield are fully related to TYR activity and expression. (Feng et al., 2015). It is located on chromosome 11, contains 5 exons and covers about 65kb of genomic DNA encoding tyrosinase, containing 529 amino acids. Almost 200 mutations have been found in this gene, including mutations that are associated with albinism, as well as some polymorphic variations are also associated with changes in eye, hair and skin coloration. (Chaitanya et al., 2018;K & Purohit, 2013).
The allele A of rs1393350 in the TYR gene is associated with decreased tyrosinase activity, resulting in a slightly pigmented phenotype, sun-sensitive skin. It was also associated with susceptibility to blue eye color instead of green and blond hair color instead of brown (Jacobs et al., 2015;Sulem et al., 2007). The polymorphic allele A of rs1042602 (TYR) is associated with the absence of freckles and found at a frequency of 35% in European populations, ancestral allele C is found most often in East Asia (Sulem et al., 2007). SNP analysis with association tests has been shown a tool for finding a relationship between genetic markers and phenotypic characteristics in a given population. (Virmond et al., 2016). Prediction models for CEV have been proposed for several populations, with the purpose of finding a model that has a better percentage of accuracy to predict the skin, eyes and hair color and even the three simultaneously (Chaitanya et al., 2018;Walsh et al., 2017). The use of Artificial Intelligence tools can bring a differential to these prediction models, making the comparison of algorithms, to evaluate which one presents a better performance in predicting CEV from the information of SNPs (Zaorska, Zawierucha, & Nowicki, 2019). Therefore, the aim of this study was to test the association between 8 SNPs present in HIrisplex tool and Externally Visible Characteristics (EVC) to provide a starting point for the development of prediction models for heterogeneous populations like the one in Pernambuco. Research, Society andDevelopment, v. 10, n. 13, e262101320955, 2021 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v10i13.20955

Study population
It was performed an analytical cross-sectional study. It was evaluated 176 samples from healthy individuals and volunteers, including teachers, students and collaborators from the University of Pernambuco, Brazil, as well as volunteers from neighboring neighborhoods. Survey participants were females, males and both older than 18 years. All participants received information about the research and signed the Informed Consent Form, allowing the use of their biological material and collected information.

Data collect
The volunteers answered a questionnaire where they self-reported eye, skin and hair color. For hair classification, the participants were asked about the original hair color at 15 years of age. In order to obtain a more accurate pattern and result, the volunteers were presented with a grid of possible hair colors including red, blond, brown and black, so that among the options, the individual pointed to the closest color they had at the age reference (de Araújo Lima et al., 2015). For the eyes color report was presented a table with different colors that are grouped in three shades: blue, green and brown. (Walsh et al., 2011). For skin color information, the participants self-reported within four categories: White, brown, yellow and black. In order to avoid bias in the self-reported information, the same interviewers were responsible for collecting all the data, which were readjusted when a large discrepancy was noted using the observation of two other researchers. This information was used to make associations with genotypes (de Araújo Lima et al., 2015).

DNA isolation, amplification and genotyping
It was collected 4mL of peripheral blood, by venipuncture, in tubes treated with EDTA or it was collected buccal mucosa cells with sterile cytological brushes, performing circular movements on the inside of the cheeks to capture the cells.
The brush was packed in 2mL tubes with 1mL of absolute ethanol. DNA extraction was performed according to the Mini Salting-out method (Whikehart, 2003). DNA samples were amplified by PCR (Polymerase Chain Reaction) reaction.

Prediction Model
Prediction modeling was performed on 137 individuals using the principal Component Analysis (PCA), followed to unsupervised learning using K-means method. After it was used the supervised learning comparing six different algorithms: Random forest, KNeighborsClassifier, DecisionTreeClassifier, RadiusNeighborsClassifier, ExtraTreesClassifier, Support Vector Classifier. For wich one this algorithms it was observed a global accuracy score and eye, hair and skin accuracy score.

Ethical considerations
The study was submitted to Plataforma Brasil for consideration by the Ethics Committee of the Oswaldo Cruz University Hospital (HUOC) that approved the research protocol, with CAAE number: 69723017.8.0000.5192.

Results
We analyzed 176 individuals with mean age of 39.9 years, 55.1% of the sample were female and 44.9% male. About the self-reported information of eye color, 68.7% said they had dark brown eyes, 22.7% light brown, 7.4% green and 1.1% blue. For hair color, 63.1% said they had black hair, 25.6% brown and 11.3% blond and for skin color 50.6% said they were black, 27.8% brown, 17% white and 4,5% yellow. For genotype analyzes 172 individuals were evaluated for rs6058017 (ASIP), 154 for rs16891982 (SLC45A2), 176 for rs1426654 (SLC24A5) and 170 for rs885479 (MC1R).

Gene
Genotype
The genotype AA (p = 0.0006 OR: 60.0) was strongly associated with brown skin. Joint analysis of GA + AA also showed an association with brown skin (p = 0.01 OR: 13.54). Strong associations were also found between genotype AA and white skin (p = 0.002 OR: 40.0) (Table 3A). In grouped analysis, the homozygous mutant AA was strongly associated with non-black skin color (p = 0.0004 OR: 35.56), Joint analysis of GA + AA also showed an association with non-black skin (p = 0.01 OR: 7.33) (Table 6A). No significant associations were found between this polymorphism and eye and hair color (Tables 1A, 2A, 4A and 5A).

rs885479 (MC1R)
It was observed a high frequency of the ancentral allele G, 94.1%, while the frequency of the polymorphic allele A was 5.9%. No individuals were found with homozygous genotype AA. The homozygous genotype GG was the most frequent, 88.2%, followed by heterozygous genotype GA, 11.8%, although no deviations were observed in Hardy-Weinberg equilibrium. No significant association was found between rs885479 and eye, hair and skin color (Tables 1A -6A).

OCA2 gene
It was observed a higher frequency of mutant allele A in rs1800404, 56.3% in relation to ancestral allele G, 43.6% of the OCA2 polymorphism. It was observed higher frequency of heterozygous genotype GA, followed by AA and GG, respectively, 45.7%, 33.5% and 20.8%. No deviations in Hardy-Weinberg equilibrium were observed for this polymorphism.
Associations were found between allele A in homozygous and green eyes (p = 0.0084; OR: 17.39), when analyzed together with the heterozygous genotype frequency, a significant association was also found. (p = 0.03; OR: 10.03) (Table 1B). In the grouped analysis the genotype AA was associated with non-brown eye color group, as well as when it was added with heterozygous genotype GA, (p = 0.01; OR: 15.80) and (p = 0.04; OR: 9.23), respectively (Table 4B).
Significants associations were found between rs1800404 and the hair color. When compared to the black hair group, the GA (p = 0.01; OR: 15.0) and AA genotypes (p = 0.01; OR: 15.32) were associated with blond hair. The sum of GA and AA genotypes also reinforces the association (p = 0.01; OR: 15.0). (Table 2B) In the grouped analysis the heterozygous genotype GA (p = 0.02; OR: 11.03), the homozygous mutant AA (p = 0.01; OR: 14.01) and their sum (p = 0.01; OR: 12.01) were also associated with blond hair color when compared to the non-blond group (Table 5B).  (Table 6B).
It was observed a high frequency of the ancestral G allele of rs1800407 in this population, 93.2%, while the mutant allele A frequency was 6.8%. No individuals with the polymorphic allele in homozygous AA were found, while the ancestral allele in homozygous GG was the most frequent 86.4%, followed by the heterozygous genotype GA, 13.6%, although no deviations were observed in Hardy-Weinberg equilibrium. No significants associations was found between rs1800407 and eye, hair and skin color (Tables 1B -6B).

TYR
Hardy-Weinberg equilibrium deviations were found for rs1042602 (X 2 4.17 and p = 0.04) and for rs1393350 (X 2 4.38 and p = 0.03). A higher frequency of ancestral allele C of rs1042602 was observed, 65.3% and polymorphic allele A, 34.7%.
The most frequent genotype was the heterozygous genotype CA, followed by CC and AA genotypes, 52.3%, 39.2% and 8.5% respectively. The ancestral allele G of rs1393350 was found in high frequency, 86.3%, followed by mutant allele A, 13.7%. No individuals with the homozygous genotypes AA was identified in this population, the frequencies found for GG and GA genotypes of rs1393350 were 72.7% and 27.3%, respectively. No significants associations was found between rs1393350, rs1042602 with the eye, hair and skin color (Tables 1B -6B).

Prediction Model
For each of the six tested algorithms, a global precision score was generated, which would be the probability of general correctness of the three characteristics and a score for eye, skin and hair color. The KNeighborsClassifier and DecisionTreeClassifier algorithms showed the best performance for general accuracy, with 43.7%, for the eye color, the RadiusNeighborsClassifier and Support Vector Classifier algorithms obtained the best performance with 73.8%, the latter also presented the best performance for the skin color, 61.9% and for hair color RadiusNeighborsClassifier and Support Vector Classifier with 76.2% (Table 7).

Discussion
Self-reported information on eye, hair and skin color is a methodology already found in some studies in Brazilian These proportions vary in different regions of Brazil due to the peculiarities of their colonization, there was a greater contribution of native Americans in north region and Africans in northeast region, in south region a low contribution was observed for both. (Salzano & Sans, 2014). In southeast region, the ancestral proportions show a larger European contribution (78.5%), followed by African (14.7%) and Amerindian (6.7%), in northeast region the European contribution is 42.9% while the African corresponds to 50.8%, Amerindian appears with 6.4% (Kehdy et al., 2015). These differences highlight the importance to evaluate genetic markers in populations from different regions of Brazil, since the Brazilian population is characterized as one of the most heterogeneous in the world. (Salzano & Sans, 2014).
The most frequent genotype of rs6058017 was AA, followed by GA and GG, 59.9%, 33.7% and 6.4% respectively.
Our data corroborate with Kanetsky et al.  who also found a higher frequency of genotype AA (78%) in a sample of the USA population and corroborate with Lima et al, they found genotype AA at a frequency of 70.8% in the southeastern Brazilian population, (de Araújo Lima et al., 2015). In a study conducted by Bonilla et al in African American population, the most common genotype was GG (66%) (Bonilla et al., 2005). In our study the mutant allele A was found at a frequency of 76.7% and the ancestral allele G, 23.3%, corroborating the data found by Lima et al. in southeastern Brazil, where the allele A was found more frequently, 84% and allele G, 16% (de Araújo Lima et al., 2015). The allele A is also found at a high frequency in Caucasians (93%) and Indians (76%) from Australia. Sulem et al., 2008;Zeigler-Johnson et al., 2004).
According to Maronas et al, the polymorphim rs6058017 (ASIP) is one of the most important markers of dark pigmentation, the allele G is found in almost 100% of individuals with this pigmentation and is often described in association with dark phenotypes in eye, hair and skin color analysis, the allele A has the opposite function, being present in individuals with light pigmentation (Maroñas et al., 2014). In the present study we found significant association between rs6058017 and skin color. The genotypes GA and AA were associated with brown skin. When one copy of allele A is present there is about 29 times more chance for an individual to have brown skin, when compared to black skin. When two copies of allele A are present, this chance increases to 45.5 times. The joint analysis of GA+AA shows a chance of 37.33, reinforcing the strength of allele (Table 3A). Genotype AA was also associated with 18.67 times more chance of presenting white skin, the joint analysis of GA+AA was also significant, but the chance drops to 13.42 times when compared to black skin (Table 3A). The above data show a tendency of the allele A to be strongly associated with non-black skin color, a result also observed in the grouped analysis, where carriers of genotype GA presenting 9.52 times more chance to have non-black skin color and carriers of two copies of allele A presenting 11.13 more chance to have non-black skin, the combined analysis of GA+AA shows a 10.50 times more chance of having non-black skin color, this sum shows the strength of this allele (Table 6A).
Zaorska et al found the association between rs6058017 and skin color in studies using neural networks methodology in a Polish population. They report the influence of this SNP on skin sensitivity to the sun and the freckles presence. (Zaorska et al., 2019). Our data corroborate the findings of Lima et al. where the homozygous genotype AA was related to white and brown skin, this genotype was associated with 8.6 times more chance of presenting white skin and 5.1 more chance of presenting brown skin in a southeastern Brazil population. No significant association were found with eye and hair color, as in the present study (de Araújo Lima et al., 2015). Durso et al did not find any association between this SNP and skin color in Rio de Janeiro -Brazil population (Durso et al., 2014).
In Asians the ancestral allele G appears at a frequency between 10 and 28% of the population, in Africans this allele appears at a frequency of approximately 80%, suggesting the association of this allele with features such as dark eye, skin and hair color, which are mostly observed in these individuals (Bonilla et al., 2005;Kanetsky et al., 2002;Voisey et al., 2006). Our data corroborate with the above cited, indicating a tendency of allele A to be associated with non-black skin pigmentation and G allele with black skin pigmentation.
The ancestral allele C of rs16891982 (SLC45A2) was found more frequently, 60.1%, followed by allele mutant G, 39.9%, corroborating with data found by Fracasso et al., where the frequency found for allele C was 64.33% in southeastern Brazil population . According to Sawitzki et al, that analyzed data from various populations in the world on ALFRED (The Allele Frequency Database), the C allele was found at a frequency of 88% in populations with a high melanin index and 6% in populations with a low melanin index. The G allele is found at a frequency of 11% in populations with a high melanin index and 93% in those with a low melanin index. (Sawitzki et al., 2017). In the present study one copy of G allele was associated with 5.14 times more chance to having white skin. When the two copies are present, this chance increases to 30 times when compared to black skin. The joint analysis of GA+AA was also significant; the chance was 6.8 time (Table 3A). Thus, the allele G seems to be associated with phenotypes that present a low melanin index, such as white skin.
Fridman et al reported an association between genotypes CG (OR: 3.08) and GG (OR: 16.35) with white skin in a southeastern Brazil population, the chance of individual presenting white skin increases when it carries the genotype GG, as in the present study . In our sample, the most frequent genotype was CG at a frequency of 57.8%, followed by genotypes CC and GG, 31.1% and 11%, respectively. Hernando et al. performed a study in southeastern Europe population in Spain and reported that GG genotype was the most recurrent in this population. The ancestral C allele appears to have a protective effect on skin sensitivity to the sun. (Hernando et al., 2018). Soejima et al described that in European populations the allele G is found at high frequency and the allele C at low frequency, the opposite happens in South Africa populations, the ancestral allele C is more recurrent and allele G appears at very low frequencies (Soejima & Koda, 2006).
According to Leite et al, individuals with lighter skin tend to have a higher proportion of European ancestry, while individuals with darker skin a greater proportion of African ancestry (Leite et al., 2011a). Our data follow the same trend as the results mentioned above, due to the migration processes in Brazil, resulting in the crossing of populations from different continents and different exchanges of genetic information. (Hart et al., 2013).
The presence of Allele C in rs16891982 encodes the amino acid leucine which plays an important role in proton transport providing an optimal pH within the melanosomes, which allows the tyrosinase activity and the adequate eumelanine production, abundant pigment in dark pigmented individuals. When the allele G is present, encoding another amino acid, phenylalanine, there may be a change in pH and eumelanine synthesis (Sawitzki et al., 2017), this may be associated with the appearance of low melanin characteristics such as skin, eyes and light hair. Our data corroborate the above and also suggest an association of the allele G with fair skin pigmentation.
The polymorphic allele A of rs1426654 (SLC24A5) was found at a frequency of 71.6%, the ancestral G, 28.4%. Lima the equator line is lower, according to the authors. This is a characteristic that justifies the natural selection action that led to the reduction of melanization in Europe (Canfield et al., 2013). The present study population had a higher frequency of heterozygotes, 48.9%, followed by AA and GG, 47.1% and 4%, respectively, differing from data found by Lima et al in southeastern Brazil, where AA was the most frequent genotype, 67.5%, followed by GA and GG, 23.4% and 9.1%, respectively. Data from Lima et. al also report a strong association between genotype AA with fair skin (OR: 47.8), when compared to the black skin group, the heterozygous genotype GA was also associated with individuals from the non-black skin color group (de Araújo Lima et al., 2015).
In the present study significant associations were found between rs1426654 and skin color, genotype AA was strongly associated with brown skin color (p = 0.0006 OR: 60.0), individuals carrying genotype AA presented 60 times more chance of having brown skin when compared to the black skin group. And when performing the joint analysis of GA + AA was also found an association (p = 0.01 OR: 13.54) with brown skin (Table 3A). Homozygous genotype AA was strongly associated with white skin, individuals carrying two copies of allele A presented 40 times more chance to have white skin color (p = 0.002 OR: 40.0) (Table 3A). No significant associations were found to heterozygous genotype GA, these were only found in joint analysis of GA+AA (Table 3A), indicating a relationship between the genotype AA and non-black individuals. The homozygous genotype GG of rs1425564 was associated with an increase in melanin index in Brasilia population (Brazil West Center). This work also reports the strong association between rs1426654 and skin color. (Leite, Fonseca, França, Parra, & Pereira, 2011b). However, the effect of alleles A and G together is not well understood, and further studies are necessary to explain this relationship (Durso et al., 2014).
This tendency of genotype AA to be associated with non-black individuals was also found in the grouped analysis.
Homozygous genotype AA was strongly associated with non-black skin color (p = 0.0004 OR: 35.56), and in joint analysis of GA + AA also found an association with non-black skin (p = 0.01 OR: 7.33) (Table 6A). These data corroborate with the studies performed by Maronas et al , who described rs1426654 as the first marker to differentiate black and non-black individuals (Maroñas et al., 2014). In south Brazil, a high frequency of allele A was reported in individuals with low melanin index, in these same individuals the frequency of allele G was very low. (Sawitzki et al., 2017). Our data also indicate this trend observed in literature cited above, where allele A is associated with light skin pigmentation and G allele with dark skin pigmentation.
We found a high frequency of ancentral allele G to rs885479, 94.1%, whereas the polymorphic allele A appears at a frequency of 5.9%, since no individuals with the genotype AA were found, genotype GG was the most frequent, 88.2%, followed by heterozygous genotype GA, 11.8%. Our data corroborate with those contained in ENSEMBL (www.ensembl.org) ("Ensembl," n.d.) and NCBI (www.ncbi.nlm.nih.gov) ("NCBI," n.d.), both report that the global frequency of allele A is low, 19%, allele G reaches 80%. The polymorphic allele is very recurrent in East Asian populations, reaching frequencies of up to 73%. (Motokawa et al., 2006;SHI, LU, LUO, XIANG-YU, & ZHANG, 2001). In African populations allele G appears at very high frequencies, over 90% and allele A at very low frequencies (Deng & Xu, 2018). This polymorphism has been associated in some studies with skin color pigmentation. (Adhikari et al., 2019;Chaitanya et al., 2018;Hart et al., 2013). No significants associations were observed between rs885479 and EVCs in presente study (Tables 1A -6A), this may be due to the low allele A frequency and the number of individuals analyzed, requiring a more representative sample to find possible associations.
Our data indicate that rs1800404 (OCA2) appears to play an important role in pigmentation and in defining Externally Visible Characteristics. We found associations between the allele A of rs1800404 and light phenotype color, while the ancestral allele G with dark phenotypes. Our data corroborate with Andrade et. al in a study conducted in Southeastern Brazil . Homozygous genotype AA was strongly associated with non-brown eye color, (p = 0.01; OR: 15.80), presenting 15 times more chances of having non-brown eyes (Table 4B). Adding GA and AA genotypes, a correlation was also found, but with a lower chance of 9.23 times, to occurrence non-brown eyes, (p = 0.04; OR: 9.23) ( Table 4B). From this data it can be inferred that one copy of allele A is not sufficient to determine the non-brown eye color but the presence of the genotype AA.
In the analyzes by category, were observed associations of the homozygous genotype AA with 17.39 times more chance to have green eyes (p = 0.0084; OR: 17.39). When was analyzed the genotypes GA and AA together, a significant association was also found (p = 0.03; OR: 10.03) (Table 1B), in this case two copies of allele A seem to be necessary for the development of feature. Andrade et al in a study conducted in Southeast Brazil also obtained similar results, the genotype AA was associated with green eye color and blond hair .
Our data corroborate with Andrade's findings in association with blond hair, the genotypes GA (p = 0.01; OR: 15.0) and AA (p = 0.01; OR : 15.32) were associated with 15 times more chance of having blond hair (Table 2B). Joint analysis of GA + AA reinforces the association with blond hair and the strength of allele A, one copy of the allele A seems to be sufficient to determine this characteristic (p = 0.01; OR: 15.0) ( Table 2B). The same trend was found in grouped analysis, GA, AA genotypes and joint analysis of GA + AA were also associated with blond hair color when compared to the non-blond group, (p = 0.02; OR: 11.03), (p = 0.01; OR: 14.01), (p = 0.01; OR: 12.01), respectively (Table 5B). Differing from the Adhikari et al.
that performed an analysis in Latin Americans, no significants associations were found between rs1800404 and hair color, this SNP was only associated with eye and skin color (Adhikari et al., 2019).
Genotypes GA, AA and GA + AA of rs1800404 were associated with non-black skin color in the grouped analysis (p <0.0001; OR: 7.94); (p <0.0001; OR: 9.48), (p <0.0001; OR: 8.53), respectively (Table 6B). One copy of the allele A gives 7.94 times more chance of having a non-black skin color and when two copies of the allele A are present (genotype AA) this chance increases to 9.48 times, the joint analysis again reinforces the strength of the allele A. In the category analyzes this SNP was related to brown skin color and most strongly associated with white skin color. The associations found between brown skin and GA, AA and GA + AA were (p = 0.0002; OR: 6.8), (p = 0.003; OR: 5.89), (p <0.0001; OR: 6.45), respectively. To white skin we found (p <0.0001; OR: 21, 25), (p <0.0001; OR: 40.80), (p <0.0001; OR: 28.77), also respectively (Table 3B).
When one copy of allele A is present there is 21.25 times more chance to have white skin, the presence of genotype AA gives 40.8 times more chance of having skin white, showing a strong association between rs1800404 and skin color.
According to Crawford et al. the ancestral allele G is associated with dark skin pigmentation and is common in most Africans, while the polymorphic allele A is related to light skin pigmentation, more common in Europeans with a frequency> 70%. In the present study the polymorphic allele was found at a frequency of 56.3% (Crawford et al., 2017) In Southeast Brazil this same association was found, the allele G related to dark skin and the allele A with light skin, corroborating with the data found in our study. . The literature relates most commonly rs1800404 with skin pigmentation, this SNP showed a significant effect on skin color in African American and Afro-Caribbean populations. (Deng & Xu, 2018).
The frequency of allele G, the most common in African populations, was 43.6% in our study, and the A allele, 56.3%, which is more common in European populations. This genotypic and phenotypic variety found in Brazil is the result of a mixture of Africans, Europeans and Native Americans, for about five centuries, generating one of the most mixed populations in the world. It is important to highlight that the variation in the colonization and occupation process of the Brazilian regions and states, created different degrees of genetic mixture throughout the country, the northeast was the cradle of portuguese colonization and where most slaves landed and settled, in northeast a Greater African ancestry is observed, while in the south a greater European ancestry, making it important to study these molecular markers in different regions and states in the same country, as well to elucidate more specific markers for each population (Magalhães da Silva et al., 2015;Pena et al., 2011;Souza, Resende, Sousa, & Brito, 2019).
There are few data in Brazil relating rs1800404 with Externally Visible Characteristics, our data show that this SNP may be a possible phenotypic predictor for the Pernambuco population. Importantly, phenotypic prediction is a complex process that involves many genetic and environmental factors and a set of markers is required to predict a characteristic, but rs1800404 can be an important marker and along with other markers that are associated with the same traits, it can be used for phenotypic prediction.
No significants associations were found between rs1800407 (OCA2) and the externally visible characteristics evaluated (Table 1B -6B), studies bring association of derivative allele A with light iris pigmentations, such as blue eyes (Andersen et al., 2016;Pośpiech et al., 2016;Walsh et al., 2011). However, in the present study the derived allele A was found at a very low frequency, 6.8%, while the ancestral allele was found at a frequency of 93.2%. This low frequency of allele A along with the sample size in this paper may be the reason for not finding significants associations. No individuals with genotype AA were found, directly reflecting the low frequency of the polymorphic allele in this population. What is expected because the allele A is more restricted mainly to Europe in a percentage of (0-11%), Italian (9.7%), Portuguese (7.5%), outside this region is found in low frequencies, Afro Americans (1.7%), Chinese (0.9%) (Andersen et al., 2016;Donnelly et al., 2012;Gu et al., 2011). A study conducted by Andrade et. al. in Brazil shows the polymorphic allele at a frequency of approximately 8% and allele G was associated with dark skin color .
Deviations in the Hardy-Weinberg equilibrium were found to rs1042602 (TYR) and rs1393350 (TYR), (X 2 4.17 and p = 0.04) , (X 2 4.38 and p = 0.03), respectively. These deviations may be associated with heterozygosis excess to rs1042602 (Chen et al., 2017), or as a result of population stratification, a common event in mixed populations, such as Brazil (de Araújo Lima et al., 2015). The most frequent genotype in rs1042602 was CA, followed by CC and AA genotypes, 52.3%, 39.2%, 8.5% respectively, as also found by Hernando et al in a sample of the Spanish population. (Hernando et al., 2018). Genotipic frequencies of rs1393350 were GG (72.7%) and GA (27.3%), no individuals with genotype AA were found, which may also have contributed to the deviation in EHW.
The ancestral allele C of rs1042602 was found at a frequency of 65.3%, while polymorphic A at a frequency of 34.7%. Sawitzki et al studying a population in south Brazil found the allele C at a frequency of 86% and allele A, 13% in people with high melanin content (from African populations). In people with low melanin content (from European populations) allele C was found at a frequency of 62% and allele A at a frequency of 37%.Performing an analysis with populations from around the world with data from ALFRED, Sawitzki et al also found the allele C in higher frequencies and our findings corroborate the data. (Sawitzki et al., 2017).
No associations were found between rs1042602 and eye, hair and skin color in the present study (Table 1B - When the allele C is replaced by the polymorphic allele A, this results in a decrease of about 40% in the catalytic activity of tyrosinase, directly affecting melanin production. Thus the presence of allele A is strongly associated with individuals with low melanin content. (Sawitzki et al., 2017). There are studies showing association of rs1042602 with the absence and presence of freckles (Kukla-Bartoszek et al., 2019), as well as in various populations with eye, skin and hair color (Chaitanya et al., 2018).
Durso et al also found an association between rs1042602 and skin color in a Rio de Janeiro population (Durso et al., 2014).
The reason for not finding significants associations with the CEV in our work may be due to the sample size, and maybe more individuals may need to be evaluated to find significant associations.
The ancestral allele G of rs1393350 (TYR) was observed at a frequency of 86.3%, followed by the mutant allele A, 13.7%, this result is expected as the genotype AA has a lower frequency in many populations worldwide (Walsh et al., 2011).This polymorphism has been associated with eye, skin and hair color, but especially eye color (Virmond et al., 2016;Walsh et al., 2011;Yun et al., 2014) In our studies no significants associations were observed between rs1393350 and skin, hair and eye color (Table 1B -6B). Pospiech et al found the polymorphic allele at a frequency of about 22% and also found no association with eye color. The allele A was associated with increased susceptibility to blond instead of brown hair by 1.29 times, a higher chance of having blue eyes instead of green eyes by 1.29 times and an increase in skin sensitivity to the sun in european population (Sulem et al., 2007). In our study, no association may have been found by the low frequency of the allele A in our sample. Hohl et al in studies conducted in Buenos Aires population also found no significants associations between rs1393350 and eye color, he comments that the genetic context of iris color in Argentina is different from other countries due to population miscegenation (Hohl, Bezus, Ratowiecki, & Catanesi, 2018), as is the case in Brazil, according to Hohl, markers already elucidated in European countries should be carefully applied to populations with different characteristics.
The tested algorithms for a possible prediction model, showed encouraging results, since it is a heterogeneous population, because most of the studies that propose a prediction model were carried out in homogeneous populations. The algorithms that presented better global performance, were above 40%, better performance for eye color above 70%, hair color above 75% and skin color above 61.9% (Table 7), lower values than those found in studies by Maroñas and Walsh (Maroñas et al., 2014;Walsh et al., 2017) who found 70 to 97% prediction values for skin color, while Valenzuela et al (Valenzuela et al., 2010) found a prediction value lower than that found in the present study 45.7%. Zaorska et al (Zaorska et al., 2019) in their studies in a Polish population, with 150 individuals they found a value of 58.3% for skin color with the Random Forest algorithm, one of which was tested in the present study, we found a value of 54.8% (Table 7), a value lower than that found with the support vector algorithm, 61.9% (Table 7), which showed the best performance for skin color.
The different values found in mentioned studies may be due to differences in the sample size, in the algorithm used or because of the characteristics of the studied populations, which are homogeneous, different from this study, which is heterogeneous. The values found in the tested algorithms for a future prediction model for this population, are satisfactory, since some of the values found exceed 70% for eye and hair color and skin compared to a population of similar sample size, the one studied by Zaorska (Zaorska et al., 2019), our model performed better. These results point to an encouraging starting point for the construction of a prediction model for the heterogeneous Pernambuco population and others heterogeneous populations.
These associations found in our data and the values found in prediction models show the importance of SNPs evaluation used in DNA Phenotyping, which is emerging as a promising forensic tool, because of its possible ability to provide new information in the context of criminal investigations. Through this technique it is possible to infer information about Externally Visible Characteristics, as well as ethnicity and information about biogeographic ancestry (Queirós, 2019).
However, data are still scarce in non-European and mixed populations such as Brazil, and further studies with a larger number of individuals are needed. However, getting associations of SNPs with CEVs and knowing how significant they can be, as found in our study, can help in phenotypic prediction analysis in heterogeneous populations.

Conclusion
Our results suggest that rs6058017, rs16891982 and rs1426654 polymorphisms may be used as phenotypic predictors of skin color and rs1800404 may be used as phenotypic predictor to skin, hair and eye color in Pernambuco population, Brazil, along with other markers that are also associated with these same characteristics. The prediction models evaluated showed satisfactory prediction rates, rates above 60% for skin color and above 70% for eyes and hair. This associations and models can assist in forensic DNA phenotyping in Pernambuco population and others mixed populations. The results showed a starting point for the development of prediction models for heterogeneous populations. However, further studies are necessary to confirm the phenotypic power of this SNPs and the prediction model. The markers studied in this study can be evaluated in other populations from different regions of Brazil, as well as other CEV markers can be added to the panel already used in this study.