Classification of Pneumonia images on mobile devices with Quantized Neural Network

This paper presents an approach for the classification of child chest X-ray images into two classes: pneumonia and normal. We employ Convolutional Neural Networks, from pre-trained networks together with a quantization process, using the platform TensorFlow Lite method. This reduces the processing requirement and computational cost. Results have shown accuracy up to 95.4% and 94.2% for MobileNetV1 and MobileNetV2, respectively. The resulting mobile app also presents a simple and intuitive user interface.

everywhere, but it is more prevalent in South Asia and sub-Saharan Africa (World, 2016).
Chest X-rays are often used to assess cases of pneumonia and are the most commonly used diagnostic tests for chest-related diseases. A very small dose of ionizing radiation is used to produce breast imaging .
Pneumonia causes a pulmonary consolidation, meaning that the pulmonary alveoli are full of inflammatory fluid, instead of air (Iorio, et al., 2018). The image identification of pneumonia, as shows in Figure 1, is related to the opacities seen on the radiography. Normal lungs exhibit darker parts near the spine (bronchi filled with air (Kunz, et al., 2018)), whereas abnormal lungs show lighter (opaque) patches, as alveoli are filled with fluid.
The low accuracy in the diagnosis of pneumonia may lead to excessive prescription of antibiotics, which is harmful to patients, and is also a cause of inventory waste. Antibiotics also kill beneficial bacteria, causing unintended health problems (Kurt, Unluer, Evrin, Katipoglu, & Eser, 2018). Moreover, the excessive use of antibiotics may lead to the proliferation of drug resistant bacteria.
Considering this scenario, computational systems capable of providing fast and accurate Pneumonia diagnosis are of great importance and are becoming increasingly common (Manogaran, Varatharajan, & Priyan, 2018). Used as an aid tool, they can minimize errors (Malmir, Amini, & Chang, 2017), while screening potential infected patients.
A recent trend in classification is the use of deep learning techniques (especially Convolutional Neural Networks -CNN's) that can deliver high classification accuracy at the expenses of high computing cost. To reduce this cost, several quantization schemes have gained attention recently, with some focusing on quantization of weight and others focusing on the activation quantizations (Choi, et al., 2018).
As a result, extensive research on weight quantification and activation to minimize CNN's computing and storage costs has been conducted, making it possible to effectively host such solutions on platforms with limited resources (for example, mobile devices) (Choi, et al., 2018). This paper describes a mobile device system capable of classifying children's chest Xray images into two classes: Pneumonia and Normal. Samples from a pre-trained CNN are subject to a quantization stage through the TensorFlow Lite platform (Jacob, et al., 2018), considerably reducing the computational cost and processing times.
The proposed method uses two pre-trained neural networks, known as MobileNetV1 (Howard, et al., 2017) and MobileNetV2 (Sandler, Howard, Zhu, Zhmoginov, & Chen, 2018), for the construction of a mobile application aiming at greater mobility. As a result, fast and accurate diagnosis of childhood pneumonia, especially in remote areas with precarious conditions can be attained. This paper comprises four sections. In section 2 presents materials and methods, results and conclusion are given in Sections 3 and 4, respectively.

The Proposed Method
We now present the proposed methodology for the training and classification of pneumonia from x-ray images on mobile devices.

Dataset
We start by describing the dataset used in the experiments. The images come from the Guangzhou Women and Children Medical Center, taken from pediatric patients aged one to five years. They are all part of the routine clinical procedure . It contains 5856 chest X-ray images (anteroposterior), categorized as: Viral Pneumonia (1493), Bacterial Pneumonia (2780) and Normal (1583). The dataset possesses quality control, with garbled and low-quality images removed. The diagnosis was given by two specialist physicians and checked by a third one in order to minimize errors .
In Figure 2 it is possible to analyze how the dataset was divided into training and validation and also the number of images in each class can be analyzed. The first two columns represent the training and test division, the blue column represents the amount of training with 70% of the images, while in orange the amount of test images with 30% is presented. In the last two columns is represented the number of images for each class, in blue is represented the Normal class with 27% of the images, while in orange is represented the Pneumonia class with 73% of the images.

Pipeline Method's
The diagram illustrated in Figure 3 shows the method's main constituent parts. Is comprised four main modules: a) a pre-trained model; b) a transfer learning process in which x-ray lung images are trained; c) quantization through the TensorFlow Lite (Hubara, Courbariaux, Soudry, El-Yaniv, & Bengio, 2017) which aims at optimizing the model for the mobile application and d) the android app for the final classification of x-ray images. Research, Society and Development, v. 9, n. 10, e889108382, 2020 (CC BY 4.

Transfer learning
Transfer Learning is a common trend in Deep Learning which aims at storing knowledge gained while solving one problem and applying it to a different but related problem. It is present in many applications such as: (Abidin, et al., 2018) (Douarre, Schielein, Frindel, Gerth, & Rousseau, 2018) (Khatami, et al., 2018) (Baltruschat, Nickisch, Grass, Knopp, & Saalbach, 2018) (Chen, Dou, Chen, & Heng, 2018). The technique consists in using a pre-trained model with distinct classes of the problem to be solved (Wu, Qin, Pan, & Yuan, 2018), this becomes an advantage in the use of small data sets (Shallu & Mehra, 2018) because there is a difficulty in getting large enough sets of data for specific problems (Ramalingam & Garzia, 2018), making it has to train complex models such as: VGG19, Xception, Inception V3, among others.
Transfer learning normally preserves the initial and intermediates layers, while the final layer is replaced and trained again (Ramalingam & Garzia, 2018). Figure 4 illustrates the transfer learning process.
For the training of neural networks, all weights are defined as non-trainable, since they were trained with the ImageNet data set. Hence, the last layer of the networks is removed and four dense layers are added, with the latter having the same number of neurons as the number of classes to be classified. The SoftMax function is used to activate the last layer of the networks in Figure 4. Research, Society and Development, v. 9, n. 10, e889108382, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i10.8382

Quantized Neural Networks
Quantized Neural Networks (QNNs) use low accuracy weights and activations. These networks are trained from scratch in an arbitrary fixed-point precision. Where in iso-precision, QNNs that use fewer bits require deeper and wider network architectures than networks that use more accurate operators, while requiring less complex arithmetic and fewer bits per weight (Moons, Goetschalckx, Van Berckelaer, & Verhelst, 2017).
A method was introduced to train quantized neural networks (QNNs) with weights and activations of extremely low precision (for example, 1 bit) at runtime. During the training stage, quantized weights and activations are used to calculate the parameter gradients. During the next steps, QNNs dramatically reduce memory size and access, replacing most arithmetic operations with bit-by-bit operations (Hubara, Courbariaux, Soudry, El-Yaniv, & Bengio, 2017).
A quantization scheme that allows inference to be performed using integer-only arithmetic was proposed in (Jacob, et al., 2018). It can be implemented more efficiently than floating-point inference in commonly available hardware-only integers.
In our approach the weights of an existing trained model are loaded and adjusted for quantization. We used the pre-trained meshes MobileNetV1 and MobileNetV2. After being trained with the images of Pneumonia, quantization of the TensorFlow Lite was applied. Results are given in Table 1.
The quantization scheme is an integer mapping q for real numbers r, that is, of the form (Jacob, et al., 2018): This scheme consists in the multiplication of two square arrays × of real numbers, 1 e 2 with its product represented by 3 = 1 2 . We denote the entries of each of these matrices ( = 1,2,∨ 3) as ( , ) for , , and the quantization parameters with which they are quantified as ( , ). We denote the inputs quantized by ( , ) . Then, Equation 1 become becomes:  Table 2 shows how the resulting models were fully quantized. We still keep the float input and output for convenience.

MobileNetV1
This network features a class of efficient models called MobileNets for mobile and integrated vision applications. MobileNets are based on a simplified architecture that uses separable convolutions in depth to build light, deep neural networks. Where two simples global hyperparameters are introduced that switch efficiently between latency and precision. These hyper-parameters allow the model builder to choose the correct size model for their application based on constraints of the problem (Howard, et al., 2017).
The MobileNet model is based on depth-separable convolutions, which are forms of factorized convolutions that factorize a standard convolution into a convolution in depth and a Research, Society and Development, v. 9, n. 10, e889108382, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i10.8382 convolution of 1 × 1 called convolution point. For MobileNets, deep convolution applies a single filter to each input channel. The point convolution then applies a convolution of 1 × 1 to combine the convolution outputs in depth. Then the depth convolution with one filter per input channel can be written in Equation 5 (Howard, et al., 2017).

MobileNetV2
This network drives the state of the art for mobile-oriented computing vision models, significantly reducing the number of operations and memory required, while maintaining the same accuracy. The main contribution is a new layer module: the inverted waste with linear bottleneck. This module takes as input a compressed low-dimension representation that is first expanded to high dimension and filtered with a deep, light convolution (Sandler, Howard, Zhu, Zhmoginov, & Chen, 2018).

Mobile Application Development
The method chosen in our work uses the Java API of TensorFlow Lite (Jacob, et al., 2018), suitable for Android and IOS application development. TensorFlow Lite is TensorFlow's solution for lightweight models for mobile and embedded devices which allows to run a trained model on a mobile device. It also makes use of hardware acceleration on Android with the Machine Learning APIs (see Figure 5). Research, Society and Development, v. 9, n. 10, e889108382, 2020 (CC BY 4.0) | ISSN 2525-3409 | DOI: http://dx.doi.org/10.33448/rsd-v9i10.8382 This case the application was developed for the Android platform, which ranks thoracic images. The goal is to aid in the rapid and accurate diagnosis of Childhood Pneumonia. For this is developed a simple and intuitive interface, which consists of two functionalities, the first is: the option to search figure 6 which consists of loading an image present on the device, the second one is the sorting option, where the most likely classification for the image is displayed.

Results and Discussion
In this section we present the results obtained in each stage of the development of this paper. We provide a comparison between the pre-trained networks MobileNetV1 and MobileNetV2, with a Batch Size parameter set to 30 and 40, respectively. Both networks employ Adam as optimizer, 100 epochs in each training and a rate of 0.0001 learning rate.
MobileNetV1 took 150 minutes to be fully trained, while MobileNetV2 spent 200 minutes to complete.

Evaluation Metrics
The model precision can be estimated by Equation 6 in which is the sum of the differences between the actual value and the expected value ^. This allow us to infer the generalization capacity of the network.

=1
As a statistical tool, we have the confusion matrix that provides the basis for describing the accuracy of the classification as well as characterizing the errors, helping refine the accuracy (Saraiva, et al., 2018). The confusion matrix is formed by an array of squares of numbers arranged in rows and columns that express the number of sample units of a particular category, inferred by a decision rule, compared to the current category.
The measures derived from the confusion matrix are: total accuracy (used in this work), individual class precision, producer precision, user precision and Kappa index, among others.
The total accuracy is calculated by dividing the sum of the main diagonal of the error matrix , by the total number of samples collected , according to Equation 7: Equation 7 = ∑ 1 .
To fully evaluate the effectiveness of the models, precision and recall are examined.
Unfortunately, precision and recall are often in tension. That is, improving precision usually reduces recall and vice-versa.

Equation 9
= + F1 Score is a simple metric, which takes both Precision and Recall into account, so you can try to maximize that number to improve your model. This is simply the harmonic mean of precision and recall.
Equation 10 1 = 2 * * + AUC -ROC Curve is a measure of performance for sorting problems in various threshold settings. ROC is a probability curve and AUC represent the degree or measure of separability (Bowers & Zhou, 2019).

Results
Before quantization, data amounts for 70.4 MB of storage in MobileNetV1. However, after quantization the size decreased considerably, reaching 23.3 MB. Likewise, in MobileNetV2, the initial size before quantization was 80.1 MB. Following the same procedure applied to MobileNetV1, data was reduced to 25.0 MB (see Table 1). This significant decrease in the model size is crucial for the development of the proposed mobile application as it also allows a crucial reduction in the computational cost necessary for the application to work on a mobile device.
Compared with InceptionV3, used in the work of  presented an accuracy of 92.8%. This is a strong indication of the benefits of data quantization compression from pre-trained neural networks applied in the area of image classification.
Results are presented in Table 4: Research, Society and Development, v. 9, n. 10, e889108382, 2020 (CC BY 4. Moreover, these preliminary results encouraged us think of an efficient Android application, with a simple and intuitive user interface, capable of performing thoracic images classification for normal and pneumonia breast images. We aim at ease of use, mobility, accuracy of classification under low computational cost and energy constraints.  The Figures 6 and 7 show the mobile application interface model used in this paper, which demonstrates the efficiency of each pre-trained network used. In the tests performed, MobileNetV1 stands out over MobileNetV2, achieving an improvement of 2.5% and 3.1% in the Normal and Pneumonia class, respectively. Figure 8 illustrate the training history of the proposed networks. It can be seen that the test accuracy of both models during the training is much larger than the training accuracy.
Hence, it is possible to perceive the generalization power of the models when they are tested.

Conclusion
This paper proposed a mobile application for the classification of x-ray images comprising normal and diseased images (pneumonia). We employed two pre-trained neural networks, MobileNetV1 and MobileNetV2, with learning transfer strategies together with quantization technique. We showed that the compression, result of the quantization process on both MobileNetV1 and MobileNetV2 led to a substantial reduction in amount of data to be processed and, therefore, the possibility to efficiently run the classification process on a mobile device.
The mobile application also presents a simple and intuitive user interface and is capable of classifying thoracic images into either normal and abnormal (pneumonia) with an accuracy up to 95.4% and 94.2% for MobileNetV1 and MobileNetV2, respectively. This is an improvement over a similar method  with 92.8% accuracy. As future work, it is intended to carry out a classification with more classes, classifying the type of Pneumonia, which may be viral, bacterial or viral caused by Covid19.