Whitening transformation

A whitening transformation is a decorrelation transformation that transforms an arbitrary set of variables having a known covariance matrix $M$ into a set of new variables whose covariance is the identity matrix (meaning that they are uncorrelated and all have variance 1).

The transformation is called "whitening" because it changes the input vector into a white noise vector. It differs from a general decorrelation transformation in that the latter only makes the covariances equal to zero, so that the correlation matrix may be any diagonal matrix.

The inverse coloring transformation transforms a vector $Y$ of uncorrelated variables (a white random vector) into a vector $X$ with a specified covariance matrix.

Definition

Suppose $X$ is a random (column) vector with covariance matrix $M$ and mean $0$ . One way of whitening $X$ means multiplying by $W = M^{-1/2}$ (when $M$ is not singular). This is called Mahalanobis or ZCA whitening. However, any other whitening matrix $W$ satisfying the condition $W^T W = M^{-1}$ is also an admissible whitening transformation (Kessy et al. 2015).

The matrix $M$ can be written as the expected value of the outer product of $X$ with itself, namely:

M = \operatorname{E}[X X^T]

When $M$ is symmetric and positive definite (and therefore not singular), it has a positive definite symmetric square root $M^{1/2}$ , such that $M^{1/2}M^{1/2} = M$ . Since $M$ is positive definite, $M^{1/2}$ is invertible, and the vector $Y = M^{-1/2}X$ has covariance matrix:

$\operatorname{Cov}(Y) = \operatorname{E}[Y Y^T] = M^{-1/2} \operatorname{E}[X X^T] (M^{-1/2})^T = M^{-1/2} M M^{-1/2} = I$

and is therefore a white random vector.

If $M$ is singular (and hence not positive definite), then $M^{1/2}$ is not invertible, and it is impossible to map $X$ to a white vector with the same number of components. In that case the vector $X$ can still be mapped to a smaller white vector $Y$ with $m$ elements, where $m$ is the number of non-zero eigenvalues of $M$ .

Whitening a data matrix

Applying many statistical methods, such as several of those in independent component analysis, to observed data matrices require an initial step of pre-whitening. Suppose $\mathbf{X} = (x_{ij}) \in \mathbb{R}^{n \times m}$ is an observed data matrix whose $n$ rows, denoted by $\mathbf{X}_i := (x_{i1},\ldots, x_{im})^{\prime}$ , correspond to realizations of $m$ -variate random vectors that are typically assumed to be independent and identically distributed (i.i.d.). Pre-whitening $\mathbf{X}$ to the matrix $\tilde{\mathbf{X}} := (\tilde{x}_{ij}) \in \mathbb{R}^{n \times m}$ is akin then to applying a whitening transformation to its rows by first

centering the $x_{ij}$ 's with respect to the columns of $\mathbf{X}$ and then
sphericizing the centered data matrix so that its sample covariance is the ( $m \times m$ )-dimensional identity matrix $\mathbf{I}_m$ .

The following describes those two steps in detail. Note here that the data matrix $\mathbf{X}$ is often defined so that its columns correspond to realizations of $n$ -variate random vectors: in this case, these steps will apply to the transpose $\mathbf{X}^{\prime} \in \mathbb{R}^{m \times n}$ .

To center $\mathbf{X}$ , define

x_{ij}^{\ast} := x_{ij} - \frac{1}{n} \sum_{i=1}^n x_{ij} \quad \text{for} \quad i=1,\ldots,n \quad \text{and} \quad j=1,\ldots,m

with the matrix

\mathbf{X}^{\ast} := (x_{ij}^{\ast}) \in \mathbb{R}^{n \times m}

denoting the centered version of

\mathbf{X}

. The result here being, of course, that each column of

\mathbf{X}^{\ast}

has a sample mean equaling zero.

The matrix $\mathbf{X}^{\ast}$ is then sphericized into the matrix $\tilde{\mathbf{X}}$ whose sample covariance is zero. One common way to perform sphericizing is using eigenvalue decomposition to express the sample covariance of $\mathbf{X}^{\ast}$ as

\frac{1}{n} (\mathbf{X}^{\ast})^{\prime} \mathbf{X}^{\ast} = \mathbf{E} \mathbf{D} \mathbf{E}^{\prime}

where

\mathbf{E}

is the matrix of eigenvectors and

\mathbf{D}

is the diagonal matrix of eigenvalues. Then,

\tilde{\mathbf{X}}

is defined

\tilde{\mathbf{X}} := \mathbf{X}^{\ast} \mathbf{E} \mathbf{D}^{-1/2}

, which ensures that the sample covariance of

\tilde{\mathbf{X}}

is

\mathbf{I}_m

.

References

External links

http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
The ZCA whitening transformation. Appendix A of Learning Multiple Layers of Features from Tiny Images by A. Krizhevsky.
A. Kessy, A. Lewin, and K. Strimmer 2015. Optimal whitening and decorrelation. arXiv:1512.00809.

Whitening transformation

Contents

Definition

Whitening a data matrix

See also

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools