Jensen–Shannon divergence

In probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)^[1] or total divergence to the average.^[2] It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it is always a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen-Shannon distance.^[3]^[4]

Definition

Consider the set $M_+^1(A)$ of probability distributions where A is a set provided with some σ-algebra of measurable subsets. In particular we can take A to be a finite or countable set with all subsets being measurable.

The Jensen–Shannon divergence (JSD) $M_+^1(A) \times M_+^1(A) \rightarrow [0,\infty{})$ is a symmetrized and smoothed version of the Kullback–Leibler divergence $D(P \parallel Q)$ . It is defined by

{\rm JSD}(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M)

where $M=\frac{1}{2}(P+Q)$

A more general definition, allowing for the comparison of more than two probability distributions, is:

{\rm JSD}_{\pi_1, \ldots, \pi_n}(P_1, P_2, \ldots, P_n) = H\left(\sum_{i=1}^n \pi_i P_i\right) - \sum_{i=1}^n \pi_i H(P_i)

where $\pi_1, \ldots, \pi_n$ are weights that are selected for the probability distributions $P_1, P_2, \ldots, P_n$ and $H(P)$ is the Shannon entropy for distribution $P$ . For the two-distribution case described above,

P_1=P, P_2=Q, \pi_1 = \pi_2 = \frac{1}{2}.\

Bounds

The Jensen–Shannon divergence is bounded by 1, given that one uses the base 2 logarithm.^[5]

0 \leq {\rm JSD}( P \parallel Q ) \leq 1

For log base e, or ln, which is commonly used in statistical thermodynamics, the upper bound is ln(2):

0 \leq {\rm JSD}( P \parallel Q ) \leq \ln(2)

Relation to mutual information

The Jensen–Shannon divergence is the mutual information between a random variable $X$ associated to a mixture distribution between $P$ and $Q$ and the binary indicator variable $Z$ that is used to switch between $P$ and $Q$ to produce the mixture. Let $X$ be some abstract function on the underlying set of events that discriminates well between events, and choose the value of $X$ according to $P$ if $Z = 0$ and according to $Q$ if $Z = 1$ . That is, we are choosing $X$ according to the probability measure $M=(P+Q)/2$ , and its distribution is the mixture distribution. We compute

\begin{align} I(X; Z) &= H(X) - H(X|Z)\\ &= -\sum M \log M + \frac{1}{2} \left[ \sum P \log P + \sum Q \log Q \right] \\ &= -\sum \frac{P}{2} \log M - \sum \frac{Q}{2} \log M + \frac{1}{2} \left[ \sum P \log P + \sum Q \log Q \right] \\ &= \frac{1}{2} \sum P \left( \log P - \log M\right ) + \frac{1}{2} \sum Q \left( \log Q - \log M \right) \\ &= {\rm JSD}(P \parallel Q) \end{align}

It follows from the above result that the Jensen–Shannon divergence is bounded by 0 and 1 because mutual information is non-negative and bounded by $H(Z) = 1$ . The JSD is not always bounded by 0 and 1: the upper limit of 1 arises here because we are considering the specific case involving the binary variable $Z$ .

One can apply the same principle to a joint distribution and the product of its two marginal distribution (in analogy to Kullback–Leibler divergence and mutual information) and to measure how reliably one can decide if a given response comes from the joint distribution or the product distribution—subject to the assumption that these are the only two possibilities.^[6]

Quantum Jensen–Shannon divergence

The generalization of probability distributions on density matrices allows to define quantum Jensen–Shannon divergence (QJSD).^[7]^[8] It is defined for a set of density matrices $(\rho_1,\ldots,\rho_n)$ and probability distribution $\pi=(\pi_1,\ldots,\pi_n)$ as

{\rm QJSD}(\rho_1,\ldots,\rho_n)= S\left(\sum_{i=1}^n \pi_i \rho_i\right)-\sum_{i=1}^n \pi_i S(\rho_i)

where $S(\pi_i)$ is the von Neumann entropy. This quantity was introduced in quantum information theory, where it is called the Holevo information: it gives the upper bound for amount of classical information encoded by the quantum states $(\rho_1,\ldots,\rho_n)$ under the prior distribution $\pi$ (see Holevo's theorem)^[9] Quantum Jensen–Shannon divergence for $\pi=\left(\frac{1}{2},\frac{1}{2}\right)$ and two density matrices is a symmetric function, everywhere defined, bounded and equal to zero only if two density matrices are the same. It is a square of a metric for pure states^[10] but it is unknown whether the metric property holds in general.^[8] The Bures metric is closely related to the quantum JS divergence; it is the quantum analog of the Fisher information metric.

Applications

The Jensen–Shannon divergence has been applied in bioinformatics and genome comparison,^[11]^[12] in protein surface comparison,^[13] in the social sciences,^[14] in the quantitative study of history,^[15] and in machine learning.^[16]

Notes

↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ ^8.0 ^8.1 Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.. English translation: Probl. Inf. Transm., 9, 177–183 (1975)) MR 456936
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, "Generative Adversarial Networks", NIPS 2014. http://arxiv.org/abs/1406.2661

External links

[1] Lua error in package.lua at line 80: module 'strict' not found.

[2] Lua error in package.lua at line 80: module 'strict' not found.

[3] Lua error in package.lua at line 80: module 'strict' not found.

[4] Lua error in package.lua at line 80: module 'strict' not found.

[Lin-5] Lua error in package.lua at line 80: module 'strict' not found.

[6] Lua error in package.lua at line 80: module 'strict' not found.

[7] Lua error in package.lua at line 80: module 'strict' not found.

[briet-8] 8.0 ^8.1 Lua error in package.lua at line 80: module 'strict' not found.

[9] Lua error in package.lua at line 80: module 'strict' not found.. English translation: Probl. Inf. Transm., 9, 177–183 (1975)) MR 456936

[10] Lua error in package.lua at line 80: module 'strict' not found.

[Sims-11] Lua error in package.lua at line 80: module 'strict' not found.

[It-12] Lua error in package.lua at line 80: module 'strict' not found.

[Ofran-13] Lua error in package.lua at line 80: module 'strict' not found.

[DeDeo-14] Lua error in package.lua at line 80: module 'strict' not found.

[Klingenstein-15] Lua error in package.lua at line 80: module 'strict' not found.

[16] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, "Generative Adversarial Networks", NIPS 2014. http://arxiv.org/abs/1406.2661

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Jensen–Shannon divergence

Contents

Definition

Bounds

Relation to mutual information

Quantum Jensen–Shannon divergence

Applications

Notes

Further reading

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools