Normal-inverse-Wishart distribution

normal-inverse-Wishart
Notation
Parameters	location (vector of real); (real); inverse scale matrix (pos. def.); (real)
Support	covariance matrix (pos. def.)
PDF

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).^[1]

Definition

Suppose

\boldsymbol\mu|\boldsymbol\mu_0,\lambda,\boldsymbol\Sigma \sim \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right)

has a multivariate normal distribution with mean $\boldsymbol\mu_0$ and covariance matrix $\tfrac{1}{\lambda}\boldsymbol\Sigma$ , where

\boldsymbol\Sigma|\boldsymbol\Psi,\nu \sim \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

has an inverse Wishart distribution. Then $(\boldsymbol\mu,\boldsymbol\Sigma)$ has a normal-inverse-Wishart distribution, denoted as

(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) .

Characterization

Probability density function

f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

Properties

Scaling

Marginal distributions

By construction, the marginal distribution over $\boldsymbol\Sigma$ is an inverse Wishart distribution, and the conditional distribution over $\boldsymbol\mu$ given $\boldsymbol\Sigma$ is a multivariate normal distribution. The marginal distribution over $\boldsymbol\mu$ is a multivariate t-distribution.

Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

\boldsymbol{y_i}|\boldsymbol\mu,\boldsymbol\Sigma \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)

where $\boldsymbol{y}$ is an $n\times p$ matrix and $\boldsymbol{y_i}$ (of length $p$ ) is row $i$ of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu).

The resulting posterior distribution for the mean and covariance matrix will also be a Nomal-Inverse-Wishart

(\boldsymbol\mu,\boldsymbol\Sigma|y) \sim \mathrm{NIW}(\boldsymbol\mu_n,\lambda_n,\boldsymbol\Psi_n,\nu_n),

where

\boldsymbol\mu_n = \frac{\lambda\boldsymbol\mu_0 + n \bar{\boldsymbol y}}{\lambda+n}

\lambda_n = \lambda + n

\nu_n = \nu + n

\boldsymbol\Psi_n = \boldsymbol{\Psi + S} +\frac{\lambda n}{\lambda+n} (\boldsymbol{\bar{y}-\mu_0})^T(\boldsymbol{\bar{y}-\mu_0}) ~~~\mathrm{ with, }~~\boldsymbol{S}= \sum_{i=1}^{n} (\boldsymbol{y_i-\bar{y}})^T(\boldsymbol{y_i-\bar{y}})

.

To sample from the joint posterior of $(\boldsymbol\mu,\boldsymbol\Sigma)$ , one simply draws samples from $\boldsymbol\Sigma|\boldsymbol y \sim \mathcal{W}^{-1}(\boldsymbol\Psi_n,\nu_n)$ , then draw $\boldsymbol\mu | \boldsymbol{\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu_n,\boldsymbol\Sigma/\nu_n)$ . To draw from the posterior predictive of a new observation, draw $\boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)$ , given the already drawn values of $\boldsymbol\mu$ and $\boldsymbol\Sigma$ .^[2]

Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

Sample $\boldsymbol\Sigma$ from an inverse Wishart distribution with parameters $\boldsymbol\Psi$ and $\nu$
Sample $\boldsymbol\mu$ from a multivariate normal distribution with mean $\boldsymbol\mu_0$ and variance $\boldsymbol \tfrac{1}{\lambda} \boldsymbol\Sigma$

Related distributions

The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If $(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)$ then $(\boldsymbol\mu,\boldsymbol\Sigma^{-1}) \sim \mathrm{NW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi^{-1},\nu)$ .
The normal-inverse-gamma distribution is the one-dimensional equivalent.
The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.

Notes

↑ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
↑ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

References

Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]

[murphy-1] Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]

[2] Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

[1]

[2]

Normal-inverse-Wishart distribution

Contents

Definition

Characterization

Probability density function

Properties

Scaling

Marginal distributions

Posterior distribution of the parameters

Generating normal-inverse-Wishart random variates

Related distributions

Notes

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Notation	$(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)$
Parameters	$\boldsymbol\mu_0\in\mathbb{R}^D\,$ location (vector of real) $\lambda > 0\,$ (real) $\boldsymbol\Psi \in\mathbb{R}^{D\times D}$ inverse scale matrix (pos. def.) $\nu > D-1\,$ (real)
Support	$\boldsymbol\mu\in\mathbb{R}^D ; \boldsymbol\Sigma \in\mathbb{R}^{D\times D}$ covariance matrix (pos. def.)
PDF	$f(\boldsymbol\mu,\boldsymbol\Sigma\|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}(\boldsymbol\mu\|\boldsymbol\mu_0,\tfrac{1}{\lambda}\boldsymbol\Sigma)\ \mathcal{W}^{-1}(\boldsymbol\Sigma\|\boldsymbol\Psi,\nu)$