EEG multipurpose eye blink detector using convolutional neural network

The electrical signal emitted by the eyes movement produces a very strong artifact on EEG signaldue to its close proximity to the sensors and abundance of occurrence. In the context of detectingeye blink artifacts in EEG waveforms for further removal and signal purification, multiple strategieswhere proposed in the literature. Most commonly applied methods require the use of a large numberof electrodes, complex equipment for sampling and processing data. The goal of this work is to createa reliable and user independent algorithm for detecting and removing eye blink in EEG signals usingCNN (convolutional neural network). For training and validation, three sets of public EEG data wereused. All three sets contain samples obtained while the recruited subjects performed assigned tasksthat included blink voluntarily in specific moments, watch a video and read an article. The modelused in this study was able to have an embracing understanding of all the features that distinguish atrivial EEG signal from a signal contaminated with eye blink artifacts without being overfitted byspecific features that only occurred in the situations when the signals were registered.


Introduction
Electroencephalography had its origin by the 1920s, Hans Berger, a neuropsychiatrist from Germany, recorded potentials from the scalp of patients with skull defects and, a few years later, with more sensitive equipment from intact subjects. Berger published over 20 reports on the electroencephalogram, a term he introduced in 1929 [1].
In the 1960s, electroencephalography (EEG) was tied to the laboratory due to equipment and recording requirements. Today the field has moved from simple and artefact-sensitive EEG recording to making real the vision of brain-computer communication. In the last 40 years, direct brain-computer interaction went from simple communication programs to sophisticated BCI-controlled applications. Generally speaking, a brain-computer interface (BCI) is a device that connects the brain to a computer and decodes in real time a specific, predefined brain activity. This brain activity has to be measured either directly, via the electrical activity of nerve cells, or indirectly [2].
The most convenient and widely used method for recording brain activity in BCI applications is electroencephalography (EEG). During EEG, brain electrical activity is recorded by placing electrodes on the scalp. This method has high temporal resolution and is safe, easy to use, and affordable [3] [4].
Electrodes are placed on specific scalp locations, for example over parietal areas toward the back of the skull to measure activity related to attentional and memory processes. The acquired analogue signal is digitized, amplified, and filtered according to the requirements of the EEG signal of interest. Relevant and precisely defined features of the resulting signal are extracted from the background "noise" generated by billions of active nerve cells. These features are then translated into device commands specific to the BCI-controlled applications [2].
Due to the nature of the acquisition process, the EEG signal can mix with other biological potentials. The greatest influence is on the ocular signal (EOG), which is in the same amplitude range as the EEG (i.e., millivolts) [5], [6], but it can still be contaminated by muscle (EMG) and cardiac (ECG) signals, both in the millivolt range [7], [8]. Broadly speaking, artifacts can originate from internal (physiological activities of the subject) and external sources (environmental interference, equipment, electrode pop-up, and cable movement) and contaminate recordings in both temporal and spectral domains [3].
The signal emitted by the eye movement produces a very strong artifact on EEG signal due to its close proximity to the sensors and abundance of occurrence [9]. For this reason eye-blinks are known to substantially contaminate EEG signals, and thereby severely impact the decoding of EEG signals in various medical and scientific applications [10]. This impact can be measured using Signal to Noise Ratio (SNR). SNR is a dimensionless number that indicates the ratio of the signal power divided by the noise power contained in a given signal, where the power P s is the square of the signal s(t) integrated over time and normalized and P N is the noise power [11].
In the context of detecting eye blink artifacts in EEG waveforms for further removal and signal purification, multiple strategies are proposed in the literature. Most commonly applied methods require the use of a large number of electrodes and complex equipment for sampling data. Some of these methods are described in [12], where the efficiency of three different Independent Component Analysis (ICA) algorithms were compared in order to detect eye blink occurrences within recorded data, and in [13], where Support Vector Machine (SVM) technique was used to identify the blinks, obtaining an accuracy of 98.4%. Recent methods require fewer EEG channels and simpler equipment, therefore easing and improving the collecting data process. Some studies implementing these recent methods have been reported. [10] proposed an algorithm able to estimate the timestamps of the start and end of the blinks, [14] compared the efficiency of Support Vector Machine (SVM) against Artificial Neural Network (ANN) on detecting eye opening, closing and blinking, reaching an accuracy of 91.9% for the SVM and 89.3% for the ANN in the blink detection experiment, [15] created an Artificial Neural Network (ANN) focused on blink detection in EEG signals, obtaining an accuracy of 90.85%, and [16] proposed the use of an one dimensional (1D) convolutional neural network (CNN) with the objective of classify recorded eye blink EEG data between voluntary and involuntary, obtaining an accuracy of 97.92%.
Although CNN has gained popularity in the last 5 years as the gold standard for image classification, it has been used in many of the state of the art deep learning algorithms [17], these neural networks can also be used for time series classification. Using a CNN for feature extraction has the advantage of not needing a prior filter model or featured engineered treatment as the kernel's weights are obtained during training and time dependent features will be extracted by the internal structure of convolutional layer [18].
As described by [18], CNN can be used for multi-variate or univariate time series, so it can be directly applied for single or multiple channel EEG's signals. The standard approach is to create a neural network with fixed input shape and one or more stacks of convolutional layers followed by a pooling layer. The number of kernels and sizes are hyperparameters that can be defined by exploring data from previous experiences with the data format [18].
The goal of this work is to create a reliable and user independent algorithm for detecting and removing eye blink in EEG signals using CNN.

Methodology
The raw EEG data present in the datasets used to train and evaluate this model was collected by Agarwal and Sivakumar [10] in a series of experiments described in the article "Blink: A Fully Automated Unsupervised Algorithm for Eye-Blink Detection in EEG Signals". Three sets of EEG data collected in those experiments, using the OpenBCI platform with sampling frequency of 250Hz, were used for the purpose of this article. All three sets contain samples obtained while the recruited subjects performed assigned tasks that included blink voluntarily in specific moments, watch a video and read an article. For each participant in the experiments there were generated two files, one with the raw EEG data and the other with the true labels of the instants when eye blinks occurred. The set named EGG-IO contains samples of 20 subjects while blinking voluntarily in specific moments, being the total experiment duration in a range within 75 and 100 seconds. The sets named EEG-VV and EEG-VR contain samples of 12 subjects while reading an article and watching a video respectively, both experiments with a duration of 5 minutes.
Aiming the highest performance of the convolutional neural network described on this paper, data preprocessing was performed equally on all three sets of data, using the Python programming language. The entire process described below can be checked at [19]. Initially, the EEG readings referring to electrodes P1 and P2, along with the time readings, were loaded from the data files into a list of tuples. The same was made with the true labels files, extracting from them the instants when eye blinks were registered. Moving average filter was applied only to the readings from the data files. Based on the information available about the data collection process, tuples with corrupted EEG readings were removed from the list of data. Following the initial steps, two sets were created, the first containing eye blink EEG readings and the second containing EEG readings without eye blinks occurrences. The first set is constituted by same sized arrays taken from the original set of data with arrays of one second windows (two channels with 256) taken from the regions where blink occurrences were registered, the readings that occurred in the exact instants registered on the labels files are the middle point of the windows. Once the windows with blink EEG readings were extracted from the original set, the remaining data was divided in same sized arrays, forming the dataset with no eye blink occurrences. Seeking to balance the quantity of data contained into the datasets, they were resampled.
Concluding the preprocessing, the datasets were joined together and divided into training, testing and validation sets. The sets were normalized and reshaped following the input format required by the convolutional neural network. Training data from two sets were then combined and fed into the convolutional neural network for training the CNN model.

Results
The final architecture of the convolutional neural network (CNN) model, used for results analysis over the testing data, was reached by finding the most efficient combination of hyperparameters for each layer. This process was performed by running a random search with the package keras tunner [20]. From the convolutional layer, the hyperparameters selected and tuned were the filters, the kernel size and strides, that correspond respectively to the number of kernels that will be applied over the input data for extracting relevant features, the dimension of these filters and their moving pace. From the dense layers, the only hyperparameters tuned was the number of units, that corresponds to the number of neurons present in each layer. Other important parameters choices for the efficiency of the model were the padding and the activation function. The first one is from the convolutional layer and is responsible for defining an operation mode used to avoid losing information over the input data or shrinking the output data when the convolution filters are applied, the second one is a parameter needed in both convolutional and dense layers and is responsible for assisting the learning process of the CNN.
The model used in this paper is represented in Figure 1 and is constituted by 5 stacked layers. The first layer is a one dimension convolutional layer with 130 filters, a kernel size of 32, the 'valid' padding mode, strides equal to 1 and the rectified linear activation function (ReLU). The following layer is a flatten layer used to reshape the received data from the previous layer into an one dimensional tensor that can be accepted as input by the next layer. The last three layers are the dense layers that compose the fully connected part of the model. The first dense layer has 152 neurons and uses ReLU activation function, the second dense layer has 150 neurons and also uses ReLU activation function, the last dense layer has only 1 neuron that provides the output of the model, using the sigmoid activation function to discriminate if the received input is an EEG signal of an eye blink occurrence or not.   The model classification performance over the three sets of validation data was evaluated using four standard metrics named accuracy, precision, recall and f1-score. These values were calculated using the results from the confusion matrix generated for each validation set. These matrices contain the count of true positive, true negative, false positive and false negative classifications. True positive and true negative correspond to the right classifications made by the model, while the false positive and false negative correspond to the wrong classifications made by the model. All three matrices were plotted and are represented in Figure 2, exhibiting a darker blue shade in the true positive and true negative squares, and a lighter blue shade in the false positive and false negative squares. Table 1 reports the calculated metrics values for each dataset, showing a good and consistent performance in all validation datasets. The fact that each dataset was collected while the participants performed three different tasks demonstrates that the CNN model was able to learn features and patterns contained in eye blink signals regardless of the situations or the external stimulus present when they were registered. Table 2 provides an overview of the existing methods used in the literature for eye blink classification and their respective accuracies. For comparison reasons, the arithmetic average of the accuracies reached in the three validations sets was calculated, resulting in 98.733%.  SVM 91.4% [14] Artificial neural network (ANN) 89.3% Many attempts were made seeking to improve even further the classification performance of the CNN, and, by analyzing the signal samples that were misclassified by the model, a data limitation stood out. Many misclassified samples were contaminated with a substantial amount of noise, this fact hinders and limits classification performance and the learning process of the CNN. Therefore, the improvement of the collected data quality is a key factor to enhance the classification performance of this model. Figure 3a is a signal sample extracted from the IO set and Figure 3b was extracted from the VV set, both signals exemplify the performance barrier mentioned.

Conclusion
This work presented a powerful application of the convolutional neural network (CNN) technique for EEG signal artifacts classification. The model used in this study was able to have an embracing understanding of all the features that distinguish a trivial EEG signal from a signal contaminated with eye blink artifacts without being overfitted by specific features that only occurred in the situations when the signals were registered. This aspect is proven by the fact that the model was trained with data from only two of the three sets available, each containing data collected under different external stimulus, and the data from the remaining set was used for validation, having the model achieved an excellent performance on it. Thus, this technique showed its potential to be applied as a tool for removing eye blink artifacts in studies that need real time EEG analyses.