Common spatial patterns: Derivations

The common spatial patterns (CSP) algorithm [1, 2] is a popular supervised decomposition method for the EEG signal analysis which is used to distinguish between two classes (conditions). It finds spatial filters that maximize the signal variance for one class, while simultaneously minimizing the signal variance for the opposite class. Here, we review the CSP algorithm and its two main implementation approaches.

We assume that the EEG is already band-pass filtered and centered. Let $X_i \in \mathbb{R}^{C \times T}$ be the EEG signal of trial $i$ where $C$ is the number of channels and $T$ is the number of samples per trial. We compute the spatial covariance $R_1 \in \mathbb{R}^{C \times C}$ by averaging over trials of class 1:

(1) $\begin{equation*}R_1 = \frac{1}{|\mathcal{I}_1|} \sum_{i \in \mathcal{I}_1} \frac{X_i X_i^T}{\trace(X_i X_i^T)}\end{equation*}$

where $\mathcal{I}_1$ is the set of indices corresponding to trials belonging to class 1, $|\mathcal{I}_1|$ denotes the size of the set $\mathcal{I}_1$ , and $\trace$ is the trace of a matrix, and spatial covariance $R_2$ equivalently for class 2. In the following derivations of CSP, we assume that $R_1$ and $R_2$ have full rank (i.e, $\rank(R_1) = \rank(R_2) = C$ ).

The goal of CSP is to find a decomposition matrix $W \in \mathbb{R}^{C \times C}$ that projects the signal $x(t) \in \mathbb{R}^C$ in the original space to $x_{CSP}(t) \in \mathbb{R}^C$ as follows:

(2) $\begin{equation*}x_{CSP}(t) = W^T x(t)\end{equation*}$

with the following properties:

(3) $\begin{equation*}W^T R_1 W = D_1\end{equation*}$

(4) $\begin{equation*}W^T R_2 W = D_2\end{equation*}$

and scaling such that

(5) $\begin{equation*}D_1 + D_2 = I_C\end{equation*}$

where $I_C \in \mathbb{R}^{C \times C}$ is the identity matrix. In other words, $R_1$ and $R_2$ share the same eigenvectors and the sum of corresponding eigenvalues is always 1. The eigenvector with the largest eigenvalue for class 1 has the smallest eigenvalue for class 2 and vice-versa.
Columns of $W$ are spatial filters. Columns of a matrix $A = (W^T)^{-1}$ represent spatial patterns.

Geometric approach

We determine whitening transformation matrix $U$ for composite spatial covariance $R_1 + R_2$ such as

(6) $\begin{equation*}U (R_1 + R_2) U^T = I_C\end{equation*}$

We factorize composite spatial covariance

(7) $\begin{equation*}R_1 + R_2 = E F E^T \end{equation*}$

where $E$ is the orthogonal matrix of eigenvectors (in columns) and $F$ is the diagonal matrix of the corresponding eigenvalues of $R_1 + R_2$ . We define whitening transformation $U$ as

(8) $\begin{equation*}U = F^{-1/2} E^T\end{equation*}$

and transform matrices $R_1$ and $R_2$

(9) $\begin{equation*}\begin{aligned}S_1 = U R_1 U^T \\S_2 = U R_2 U^T\end{aligned}\end{equation*}$

We factorize matrix $S_1$

(10) $\begin{equation*}S_1 = P D_1 P^T\end{equation*}$

where $P$ is the orthogonal matrix of eigenvectors and $D_1$ is the diagonal matrix of the corresponding eigenvalues of $S_1$ . We define decomposition matrix $W^T$ as

(11) $\begin{equation*}W^T = P^T U\end{equation*}$

Then this $W$ satisfy 3

(12) $\begin{equation*}W^T R_1 W = P^T S_1 P = D_1\end{equation*}$

and also 4 using 5

(13) $\begin{equation*}W^T R_2 W = P^T U R_2 U^T P = P^T (I_C - U R_1 U^T) P = I_C - D_1\end{equation*}$

Generalized eigenvalue problem approach

We can directly solve $W$ by getting $W^T$ from 5 [3]:

(14) $\begin{equation*}D_1 + D_2 = I_C = W^T (R_1 + R_2) W\end{equation*}$

(15) $\begin{equation*}W^T = W^{-1} (R_1 + R_2)^{-1}\end{equation*}$

and by inserting this into 3

(16) $\begin{equation*}W^{-1} (R_1 + R_2)^{-1} R_1 W = D_1\end{equation*}$

we get

(17) $\begin{equation*}R_1 W = (R_1 + R_2) W D_1\end{equation*}$

which is an equation of generalized eigenvalue problem. Or equivalently, by inserting $W_T$ into 4 we get the following generalized eigenvalue problem

(18) $\begin{equation*}R_2 W = (R_1 + R_2) W D_2\end{equation*}$

Another solution

We also mention another solution $W_{g}$ (with different diagonal values $D_1$ and $D_2$ ), which is often present in the literature, that satisfies only 3 and 4 but not 5. We get $W_g^T$ from 3 as

(19) $\begin{equation*}W_g^T = D_1 W_g^{-1} R_1^{-1}\end{equation*}$

and by inserting this into 4

(20) $\begin{equation*}D_1 W_g^{-1} R_1^{-1} R_2 W_g = D_2\end{equation*}$

we get

(21) $\begin{equation*}R_2 W_g = R_1 W_g (D_1^{-1} D_2)\end{equation*}$

which is a generalized eigenvalue problem. This solution has different eigenvalues. $W_g$ differs from $W$ only by a diagonal scaling matrix $G$

(22) $\begin{equation*}W_g = G^{1/2} W\end{equation*}$

with $G = D_1 + D_2$ to satisfy 5.

[1]

J. Müller-Gerking, G. Pfurtscheller, and H. Flyvbjerg, “Designing optimal spatial filters for single-trial EEG classification in a movement task,” Clinical neurophysiology, vol. 110, iss. 5, p. 787–798, 1999.
[Bibtex]

@article{Muller-Gerking1999,
abstract = {We devised spatial filters for multi-channel EEG that lead to signals which discriminate optimally between two conditions. We demonstrate the effectiveness of this method by classifying single-trial EEGs, recorded during preparation for movements of the left or right index finger or the right foot. The classification rates for 3 subjects were 94, 90 and 84{\%}, respectively. The filters are estimated from a set of multi-channel EEG data by the method of Common Spatial Patterns, and reflect the selective activation of cortical areas. By construction, we obtain an automatic weighting of electrodes according to their importance for the classification task. Computationally, this method is parallel by nature, and demands only the evaluation of scalar products. Therefore, it is well suited for on-line data processing. The recognition rates obtained with this relatively simple method are as good as, or higher than those obtained previously with other methods. The high recognition rates and the method's procedural and computational simplicity make it a particularly promising method for an EEG-based brain–computer interface.},
author = {M{\"{u}}ller-Gerking, Johannes and Pfurtscheller, Gert and Flyvbjerg, Henrik},
doi = {10.1016/S1388-2457(98)00038-8},
issn = {13882457},
journal = {Clinical Neurophysiology},
month = {may},
number = {5},
pages = {787--798},
publisher = {Elsevier},
title = {{Designing optimal spatial filters for single-trial EEG classification in a movement task}},
url = {https://www.sciencedirect.com/science/article/pii/S1388245798000388 https://linkinghub.elsevier.com/retrieve/pii/S1388245798000388},
volume = {110},
year = {1999}
}

[2]

B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. Muller, “Optimizing Spatial filters for Robust EEG Single-Trial Analysis,” Ieee signal processing magazine, vol. 25, iss. 1, p. 41–56, 2008.
[Bibtex]

@article{Blankertz2008,
author = {Blankertz, Benjamin and Tomioka, Ryota and Lemm, Steven and Kawanabe, Motoaki and Muller, Klaus-robert},
doi = {10.1109/MSP.2008.4408441},
issn = {1053-5888},
journal = {IEEE Signal Processing Magazine},
number = {1},
pages = {41--56},
title = {{Optimizing Spatial filters for Robust EEG Single-Trial Analysis}},
url = {http://ieeexplore.ieee.org/document/4408441/},
volume = {25},
year = {2008}
}

[3]

L. C. Parra, C. D. Spence, A. D. Gerson, and P. Sajda, “Recipes for the linear analysis of EEG,” Neuroimage, vol. 28, iss. 2, p. 326–341, 2005.
[Bibtex]

@article{Parra2005,
abstract = {In this paper, we describe a simple set of “recipes” for the analysis of high spatial density EEG. We focus on a linear integration of multiple channels for extracting individual components without making any spatial or anatomical modeling assumptions, instead requiring particular statistical properties such as maximum difference, maximum power, or statistical independence. We demonstrate how corresponding algorithms, for example, linear discriminant analysis, principal component analysis and independent component analysis, can be used to remove eye-motion artifacts, extract strong evoked responses, and decompose temporally overlapping components. The general approach is shown to be consistent with the underlying physics of EEG, which specifies a linear mixing model of the underlying neural and non-neural current sources.},
author = {Parra, Lucas C. and Spence, Clay D. and Gerson, Adam D. and Sajda, Paul},
doi = {10.1016/j.neuroimage.2005.05.032},
issn = {10538119},
journal = {NeuroImage},
month = {nov},
number = {2},
pages = {326--341},
publisher = {Academic Press},
title = {{Recipes for the linear analysis of EEG}},
url = {https://www.sciencedirect.com/science/article/pii/S1053811905003381 https://linkinghub.elsevier.com/retrieve/pii/S1053811905003381},
volume = {28},
year = {2005}
}

Milan Rybář

Common spatial patterns: Derivations

Geometric approach

Generalized eigenvalue problem approach

Another solution