A method based on pix2pix to attenuate bias in the analysis of wound healing assays

The advances of new technologies in the machine learning area have led to the development of conditional generative adversarial networks with the direct use of images, such as is the case of the pix2pix model. A potential application for the pix2pix model discussed in this work is the analysis of images of wound healing or scratch assays that are widely used to evaluate in vitro cell migration. The most common way to evaluate the results of the wound healing assay is by manually detecting the wound area in the image, separating the empty area and the area occupied by cells, during 24, 48 or even 72 h. Although this procedure has for long been presented in the literature, it has been indicated that it lacks objectivity, it is time-consuming, and it leads to data misinterpretation. In an attempt to overcome the lack of robustness and consistency showed by the manual evaluation, this work aims to implement a method based on pix2pix to reduce bias in wound healing analysis, while introducing a new point of view of the images analysis. Manually introduced bias in the image processing algorithm presented deviations of up to 15 % when slightly varying a single variable, while the image processing performed by the model resulted in deviations mostly within 6 % when compared with manual analysis.


Introduction
Wound healing is a complex process in the body, which involves different cell populations and extracellular matrix.
The healing process occurs basically in four stages: hemostasis, inflammation, proliferation and tissue remodeling. Migration occurs as a cell response to a stimulus of an injury, involving keratinocytes, fibroblasts, macrophages, platelets and endothelial cells. These cellular types draw up and keep the healing through several grow factors and cytokines (Guo & DiPietro, 2010;Tonnesen et al., 2000;Velnar & Gradisnik, 2018). The migration is usually accessed for epithelial, endothelial and fibroblast cell lines (Monsuur et al., 2016) due to their adherence potential.
Among a variety of manners used to evaluate cell migration, the scratch assay is the simplest one (Justus et al., 2014;Mouritzen & Jenssen, 2018), which consists of an in vitro technique used to evaluate the cells migration rate. It is based on the mechanical, chemical or thermal removal of a group of cells belonging to a confluent monolayer, followed by a cell migration towards this empty region resulting from a lack of cell-to-cell direct contact (Rodrigues et al., 2019). However, the results from this assay are commonly manually assessed (Ieso & Pei, 2018), which, in addition of being a time-consuming procedure, leads to subjectivity and, occasionally, data misinterpretation (Simpson et al., 2014).
In the image analysis, selection of the edge of the wound is performed manually, which is very subjective, and can vary depending on the person performing the measurement (Zordan et al., 2011). Although it seems to be a quite simple method, it can be difficult to detect the wound-closure event, because the cells normally do not form a perfect monolayer (Jonkman et al., 2014).
To improve the analysis process, a number of software aiming to classify the migration via wound border detection were developed (Geback et al., 2009;Nunes & Dias, 2017), since it has been stated that cells at the edge of an empty surface begin to migrate in order to fill this empty space (Auerbach et al., 1991). Although they analyze the images from the migration assay, these software also require manual inspection and correction using variables such as brightness and contrast. On the other hand, there are reports of analyses in which a better understanding of wound healing assay is found when considering the cell coverage area (Choudhury et al., 2014).
In this sense, the development of automated methods of analysis represent an important contribution for the area. Furthermore, with the advent of generative adversarial networks (GANs), image generation and image-to-image translation greatly evolved (Goodfellow et al., 2020). An example of image-to-image translation model is the pix2pix method, which already demonstrated its wide applicability, such as are the cases of satellite to map, day to night, labels to facade, and black and white to color translations (Isola et al., 2017). GANs are based on a generator model intended to create synthetic images and on a discriminator model that classifies images as fake (images created by the generator) or real (input images). While the discriminator is directly updated in order to better classify the images, the generator is updated accordingly to the discriminator, aiming to better fool the discriminator (Goodfellow et al., 2020). A special case of GANs are the conditional GANs (or cGANs), where the generator is conditioned to an input (Mirza & Osindero, 2014). In the case of the pix2pix model (a type of cGAN), the generator is conditioned to an input image and it is based on a "U-Net" architecture, and the discriminator is based on a convolutional "PatchGAN" classifier (Isola et al., 2017).
In this panorama, this work describes a method implemented as a fully automated network, which analyses a set of wound healing assay images considering the cell coverage area. The pix2pix model without human interaction, expensive equipment, nor extensive sets of image analyses, drastically reducing errors emerging from biased analyses and enhancing the results obtained from the wound healing assay.

Dataset
This research uses images of wound healing assays from a database containing 400 images acquired by three investigators from two independent laboratories. Cell lines were cultured according to (Favretto et al., 2021). Images of human endothelial cell lines EA.hy926 (ATCC®CRL2922, Manassas, VA, USA) were acquired with a DEM35 digital eye-piece for microscope (0.3 Mpixels, MiniSee Software, magnification of 5 X, SCOPETEK, Hangzhou, China) with a 4 X microscope magnification (OPTIPHASE Inverted microscope, Los Angeles, USA). An inverted microscope AE2000 (MOTIC, Hong Kong, China) with a magnification of 40 X was used to obtain images from adenocarcinoma MCF-7 (ATCCR®HTB-22) and MDA-MB-231 (ATCCR®HTB-26) cell lines. Images covered total witdhts of 1 mm for endothelial and 2.93 mm for adenocarcinoma cells.

Ground Truth Dataset
In order to obtain the ground truth images, a sequence of image processing operations was performed, with steps illustrated in Figure 1. Within this algorithm, the real width (RW, in mm) of the photographed area is used as a parameter regulator (Step I).

Figure 1.
Steps of the developed algorithm to achieve the ground truth dataset from raw input images. These steps include resizing, Gaussian blur, automatic brightness and contrast adjustments, adaptive threshold, and Canny edge detection method.
Source: Authors. Step IV) in order to eliminate cell debris (small particles). This threshold is used as a binarization method to segment the image according to a pixel intensity cutoff, which turns each pixel intensity into 0 or 100\% (black and white, respectively).
The Canny edge detection method (Canny, 1986) is then used to enhance the cells edges, when a method to draw contours follows (Figure 1g, Step V). This edge detection method allows the enhancement of the location of cells (Figure 1h and i) by a double-step method based on repeating a Gaussian blur followed by a threshold (as the threshold previously described), with large (Step VI) and small kernel (Step VII) sizes, respectively. The step performed from Figure 1g   In the usage of this sequence of image processing operations, it is possible to set a single configuration of parameters (such as RW value and kernel size) for a set of images or adapt them to each case aiming to achieve the best output case. Thus, even though this method was used to eliminate biased results, these parameters were changed in some cases in order to improve the ground truth dataset.
Slight adjustments in the variables as indicated by Figure 2 resulted in variations of up to 9 % in Step I, 1 % in Step II, 3 % in Step III, 15 % in Step IV, 4 % in Step V, 1 % in Step VI, and 6 % in Step VII. A 15 % in the final detected area occupied by cells, as it is indicated by the graphic in Figure 2 Step IV, indicates the bias present in this type of manual evaluation.
Since both original input and the expected output (ground truth, GT) are defined, 400 images were separated into training and test sets using 70 % (280 images) and 30 % (120 images) of the total number of images, respectively.

Pix2pix
The pix2pix model, as described in the introduction, is a conditional GAN, in which the discriminator (D) learns to discriminate between real and fake combinations, while the generator (G) learns to fool D, as it is contextualized to the herein presented problem in Figure 3. . Training a pix2pix to map input image to black and white image. The discriminator indicates that the ground truth corresponds to the correct output of the original input image (left). The generator tries to fool the discriminator by creating a fake output image (G(x)) while the discriminator learns to distinguish between fake and real output images (right).
The pix2pix model was implemented with Keras using Tensorflow as backend (Abdelmotaal et al., 2021) (with GPU usage), in which the training was performed with the training dataset for 100 epochs, saving the state each 10 epochs for the progress evaluation. In order to achieve this goal, a computer with 16 GB RAM, intel® Core™ i7-7700HQ CPU @ 2.80 GHz, x64 based processor, and NVIDIA GeForce GTX 1070 was used. The occupied areas acquired from the ground truth dataset were compared with the occupied areas obtained from the images outputted by the developed network. Processing times between the network and the ground truth algorithm were also evaluated.
To further test the limits of the trained model, input test images were rotated 90, 180 and 270º, while being flipped horizontally and vertically, generating 7 new images per original dataset image. Since the input images are not squared, the image processing by the models (which is based on 256x256 images) can be evaluated by the comparison of original, rotated and flipped images, which should present identical occupied areas.

Results and Discussion
The total training time was 1 h 26 min, from which 10 resulting models (e10-e100) were used to evaluate the area occupied by cells in the training and testing datasets containing 280 and 120 images, respectively (see Figure 4).  Figure 4a it is possible to verify that e60-e100 presented the similar average area percentage of deviation in relation to GT among the models generated by the pix2pix model, but e100 presented the smallest value, which will be, therefore, used for further tests.
Evaluating the area occupied by cells obtained by GT and by e100 in each image output individually, it is possible to observe that their differences can reach up to 18 % for specific cases. Figure 5 shows those images for which the difference in the area occupied by cells resulting from the image processing by e100 was superior than 10 % when compared with GT. Although differences of more than 10 % of the occupied area were observed between outputs presented in Figure 5b and Figure 5c, the patterns presented are similar and outputs from e100 tend to be more sensitive to empty areas than the GT outputs. This could indicate once again the effects of bias in the manual image processing, which is drastically reduced in the pix2pix method.
Rotating and flipping the images of the testing dataset, generating 8 inputs from each original image that should result into the same occupied area, the e100 model resulted in an average error of about 0.3 % (see Figure 6). Note: The fitted curve to the datapoints is a normal curve.
One of the outliners in Figure 6 (with error larger than 1 %) is an image that had its GT output altered manually by changing variables as presented in Figure 2. Figure 6 also demonstrates that although the inputs are required to be resized, it does not affect the model output, since its rotation produces an almost traceless percentage deviation.
Thus, the e100 model presented errors within (5.1 ± 0.3) % of the occupied area when compared with the GT and the image orientation did not cause a large deviation (error of (0.27 ± 0.02) %). These errors present a smaller percentage deviation than those demonstrated by Figure 2, where a single variable modification originated from bias can lead to errors of up to 15 %, suggesting that manual analysis, even using an algorithm, is biased. The image processing by these models took an average of 0.14 s. However, a limitation of this model is that it is still developed based on a biased process, which could lead to misidentifications in the network processing.

Conclusion
A pix2pix model that removes bias in the wound healing assay image evaluation through the removal of human interaction in this analysis was developed, where the neural network assumes total control of the image processing. In opposition to previously published works (Geback et al., 2009;Nunes & Dias, 2017), this technique makes use of the total area occupied by cells and abolishes human interaction with the assay analysis, potentially reducing bias, reducing workforce, and improving results in this step of the assay. Furthermore, the entire set of models can be obtained in less than 2 hours with the appropriate hardware and the images can be processed in less than 1 second. Although images were not acquired with equal values of pixels in the x and y directions, the rotation of these images did not present significant alterations in final outputs.
Thus, resizing these images does not significantly affect the outputs.
Future works aim to apply regularization to the cGAN, such as is the case of validation-based or cross-validation Early Stopping, and to produce a less biased algorithm to independently evaluate the healing images that will be then used as GT to the pix2pix model.