EEG signals record the electrical activity of the brain using EEG electrodes placed on the scalp. They are noisy, have artifacts, and, above all, they are not the type of signals people are used to deal with (images, charts,...). Doctors, neuroscientists, and biomedical engineers usually receive training for years to understand and extract meaningful information from EEG data.
Even in these cases, the raw recorded data needs to be processed before specialists look at it. Temporal and spatial filtering is usually applied, as well as artefact rejection procedures, even if the participant is still during recording. This processed EEG can then be visually inspected to detect anomalies (e.g. episode of epilepsy), changes in the mental state (e.g. sleep phases) or to study grand average responses of groups of people.
Visual inspection is a long, expensive, and tedious process. It does not scale up well and cannot be transferred to BCI applications. AI and machine learning tools are the perfect companion to automate, extend, and improve EEG data analysis. Indeed, BCI systems such as spellers or brain-controlled devices are based on decoding pipelines that use extensively different machine learning algorithms.
Before the deep learning revolution, the standard EEG pipeline combined techniques from signal processing and machine learning to enhance the signal to noise ratio, deal with EEG artefacts, extract features, and interpret or decode signals. Figure 1 shows the most common pipeline when processing EEG.
From a computational point of view, the raw EEG signal is simply a discrete time multivariate (i.e. with multiple dimensions) time-series. The number of EEG channels determines the dimension of each point of the time series. Each point time corresponds to an EEG sample acquired at the same time point. The number of points in the time series depends on the recorded time and the sampling rate (e.g. 256 Hz). These raw signals are rarely used since they (may) contain DC offsets and drifts, electromagnetic noise, and artifacts that need to be filtered out. Signal processing is used in the first steps to remove noise, filter out artifacts, or isolate an improved version of the signal of interest. Noise and artifacts are such an important part of the analysis of the EEG signals that a whole body of literature has studied and continues to study this problem. You can learn about this in our dedicated post All about EEG artifacts and filtering tools.
Once the signal is clean, it is time to enhance and uncover the brain patterns and neural correlates of interest. In many cases, the brain processes under study are located in a particular frequency band, such as the P300 evoked response that occurs in the Theta band (4-7 Hz) or the modulation of the sensorimotor or mu rhythms, which occur between 8 and 15 Hz. The simplest processing is to use frequency filters, such as low-pass or band-pass filters, to isolate the bands of interest and remove those frequencies of no interest. Figure 2 shows the spectrum of EEG activity and the most common bands used when analyzing EEG correlates.
Figure 2: Left: EEG spectrum for two different conditions: focused vs. distracted. Based on it, one may select the frequency range shaded in Grey to distinguish these two conditions. Right: EEG activity filtered on the most common bands. The Gamma Band [30--140Hz] also shows correlated activity with cognitive processes and shows alteration for cognitive disorders.
After pre-processing, it is time to extract meaningful features from the cleaned EEG data. In the pre-deep learning era, feature extraction was based on ad-hoc methods for the brain process of interests that range from hand-crafted features to more sophisticated techniques, such as linear and non linear spatial filtering. The latter range from generic methods such as principal component analysis and independent component analysis, to more EEG specific ones such as CSPs (Blankertz, 2007) and variants (Ang, 2008) for power features and X-Dawn (Rivet, 2009) for temporal ones. Figure 3 shows one of the simplest feature extraction methods, which basically subsamples directly in the temporal or frequency domain of the cleaned EEG signals.
The extracted features are usually tailored to the specific application, such as finding differences between experimental conditions (e.g. levels of attention, responses to mismatched actions), distinguishing between a group of predefined classes (e.g. a speller), predicting behavior (e.g. by anticipating motion in neurorehabilitation), finding anomalies with respect to a normative database (e.g. QEEG or seizures). Current state-of-the-art techniques include Riemannian geometry-based classifiers, filter banks, and adaptive classifiers, used to handle, with varying levels of success, the challenges of EEG data (Perronnet, 2016, Lotte 2018).
Once features are ready, it is time to use the information to automatically decode EEG. The most common approach is supervised learning. This uses a set of examples known as the training dataset to learn a model that can classify, predict, or identify the EEG patterns based on the extracted features. A large variety of methods exist. The most common are classification methods, which classify an EEG pattern into one of a set of predefined classes, or regression methods, which transform the EEG pattern into another signal such as a motion direction. Used methods include simple linear methods (LDA for classification and Multiple Linear Regression), SVM like kernel methods, random forests, neural networks (see Section 4 for the Deep Learning methods), or a more sophisticated combination of methods.
Whatever the method, the supervised approach needs to have a labeled training dataset. This dataset is used to train and evaluate the method, normally using cross-validation. There are some important considerations for EEG decoders due to the non-stationary and subject dependent EEG nature: 1) the extracted features for one person at a certain point in time may not be well-suited for the same person later on; and 2) the features for a particular participant may be different than the features for another participant. In technical terms, the distribution of the features changes, and the models need to be retrained on an updated training dataset. Initially, decoders were participant and session-specific, i.e. a dedicated training set is acquired for each participant and session. In practical terms, this has a big impact on the effort that has to be made to build and train these models and in the deployment of them out of laboratory settings. Calibrating each participant is an expensive and tedious process!
To overcome this limitation, several different methods are available. Current techniques aim at minimizing this calibration process and attempt to design robust methods that work over time and across participants (Lopéz-Larraz, 2018).
There is one last point that deserves discussion. Up to now, we have assumed that we know exactly at what point in time we have the relevant EEG information. Although this is the case for many applications (e.g. an EEG speller), in many other BCI and neurotech applications, this assumption does not hold. Consider, for instance, detecting an epilepsy seizure at home, or detecting the intention of moving a limb during a neurorehabilitation session. In this setup, it is necessary to process EEG seizure online or in an asynchronous manner. This adds an additional challenge to the decoding task: it is not enough to distinguish the patterns of interest, but one also has to deal with background EEG.
All the previous processing has to be extended or adapted to obtain such asynchronous decoding. The simplest way is to use a sliding window where we compute our output for each window independently (see Figure 4 for a motion decoding example). During training, background EEG is labeled as “rest, while the onset of motion is extracted using some calibration protocol such as EMG activity or buttons. The same supervised learning algorithms can then be applied to learn the decoder. The latter can then be used over a sliding window to provide a continuous decoding.
Deep learning has radically changed machine learning in many domains (e.g. computer vision, speech, reinforcement learning, etc.) by providing general purpose and flexible models that can work with raw data and learn the appropriate transformations for a problem at hand. These models can use large amounts of EEG data to directly learn features and capture the data structure in an efficient way that can be then transferred and/or adapted to different tasks. This end-to-end learning ability fits perfectly with the requirements of EEG analysis, where multiple interdependent processes are key and, until recently, were carefully designed for each different purpose.
EEG data has its own challenges.
These challenges have not stopped researchers and practitioners from using deep learning, and the last 10 years have seen a fast increase of results across all the fields related to EEG data. There has been an increasing interest in using this type of technique. A very interesting review over more than 100 papers sheds light on the current state of the art. Figure 5 shows how the main fields of application of EEG data analysis have tried deep learning and what deep models are the most common. There is still no clear dominant architecture. Many of those applied to EEG have been directly borrowed from previous applications such as computer vision. Therefore, convolutional neural networks (CNNs) are the most common architecture, while autoencoders and recurrent networks are also used often.
In most cases, the deep learning methods perform feature extraction and decoding simultaneously (see Figure 1) and they use the same supervised approach described in Section 2. In many cases, the pre-processing is simplified, for example, by computing power features or segmenting the input data. Interestingly, some deep models have shown end-to-end decoding performance, improving previous methods while dealing directly with common EEG issues such as eye motions (eye-opening and closing, blinking, etc.), artifacts, or background EEG.
The authors of the meta-review in Roy (2019) have computed the median improvement in accuracy to be around 5.4% consistently across all domains shown in Fig. 6. Although they also point out some reproducibility concerns, the results show that, despite the challenges mentioned above, Deep Learning improves decoding results - in many cases, with minimal or no pre-processing. One interesting consequence of using data-hungry deep learning techniques is that the standard participant/session-specific setup has been substituted for a more ecological one where all sessions and participants contribute for decoders. To give a more detailed view, we highlight results in three different applications that are relevant for understanding the current state of the art:
The previous examples show that deep learning techniques are now present in all EEG decoding applications and represent the current state of the art. There are still many open questions, such as which models work best, and whether EEG- specific models and algorithms are needed
For those interested in the technical details of how the different networks have been used with EEG, we recommend consulting some very complete reviews (Roy, 2019; Craik, 2019) that provide references to the appropriate works. Most of the results have been obtained using public datasets and code is available in the corresponding repositories (see for instance, the braindecod github for a complete deep learning decoding using CNN networks (Schirrmeister,2017)).
Beware of the hype! The increased number of EEG experiments or studies claiming better results with deep learning have not been free of controversy. Reproducibility and, when possible, comparison against well based established baselines are a must, and their lack should be treated carefully when evaluating any claims. Interestingly, Roy (2019) points out that only 7% of the reported results provide both the software (19%) and the datasets (54%) required to evaluate and replicate the method. Sometimes there are sensitive reasons for not providing source and datasets, such as privacy in medical records, or the need to exploit the dataset or the code for your own research before making it public. Nevertheless, nowadays these good practices are becoming more common and, in some cases, are required to publish the data. They are always a good indicator of the quality of the work and a good starting point for your own projects.