Normal-inverse-Wishart distribution

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
normal-inverse-Wishart
Notation (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)
Parameters \boldsymbol\mu_0\in\mathbb{R}^D\, location (vector of real)
\lambda > 0\, (real)
\boldsymbol\Psi \in\mathbb{R}^{D\times D} inverse scale matrix (pos. def.)
\nu > D-1\, (real)
Support \boldsymbol\mu\in\mathbb{R}^D ; \boldsymbol\Sigma \in\mathbb{R}^{D\times D} covariance matrix (pos. def.)
PDF f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}(\boldsymbol\mu|\boldsymbol\mu_0,\tfrac{1}{\lambda}\boldsymbol\Sigma)\ \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]

Definition

Suppose

  \boldsymbol\mu|\boldsymbol\mu_0,\lambda,\boldsymbol\Sigma \sim \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right)

has a multivariate normal distribution with mean \boldsymbol\mu_0 and covariance matrix \tfrac{1}{\lambda}\boldsymbol\Sigma, where

\boldsymbol\Sigma|\boldsymbol\Psi,\nu \sim \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

has an inverse Wishart distribution. Then (\boldsymbol\mu,\boldsymbol\Sigma) has a normal-inverse-Wishart distribution, denoted as

 (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)  .

Characterization

Probability density function

f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

Properties

Scaling

Marginal distributions

By construction, the marginal distribution over \boldsymbol\Sigma is an inverse Wishart distribution, and the conditional distribution over \boldsymbol\mu given \boldsymbol\Sigma is a multivariate normal distribution. The marginal distribution over \boldsymbol\mu is a multivariate t-distribution.

Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

\boldsymbol{y_i}|\boldsymbol\mu,\boldsymbol\Sigma \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)

where \boldsymbol{y} is an n\times p matrix and \boldsymbol{y_i} (of length p) is row i of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly


(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu).

The resulting posterior distribution for the mean and covariance matrix will also be a Nomal-Inverse-Wishart


(\boldsymbol\mu,\boldsymbol\Sigma|y) \sim \mathrm{NIW}(\boldsymbol\mu_n,\lambda_n,\boldsymbol\Psi_n,\nu_n),

where


\boldsymbol\mu_n = \frac{\lambda\boldsymbol\mu_0 + n \bar{\boldsymbol y}}{\lambda+n}

\lambda_n = \lambda + n

\nu_n = \nu + n

\boldsymbol\Psi_n = \boldsymbol{\Psi + S} +\frac{\lambda n}{\lambda+n} 
(\boldsymbol{\bar{y}-\mu_0})^T(\boldsymbol{\bar{y}-\mu_0})
~~~\mathrm{ with, }~~\boldsymbol{S}= \sum_{i=1}^{n} (\boldsymbol{y_i-\bar{y}})^T(\boldsymbol{y_i-\bar{y}})
.


To sample from the joint posterior of (\boldsymbol\mu,\boldsymbol\Sigma), one simply draws samples from \boldsymbol\Sigma|\boldsymbol y \sim \mathcal{W}^{-1}(\boldsymbol\Psi_n,\nu_n), then draw \boldsymbol\mu | \boldsymbol{\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu_n,\boldsymbol\Sigma/\nu_n). To draw from the posterior predictive of a new observation, draw \boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma) , given the already drawn values of \boldsymbol\mu and \boldsymbol\Sigma.[2]

Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

  1. Sample \boldsymbol\Sigma from an inverse Wishart distribution with parameters \boldsymbol\Psi and \nu
  2. Sample \boldsymbol\mu from a multivariate normal distribution with mean \boldsymbol\mu_0 and variance \boldsymbol \tfrac{1}{\lambda} \boldsymbol\Sigma

Related distributions

Notes

  1. Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
  2. Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

References

  • Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
  • Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]