Concentration inequality

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The laws of large numbers of classical probability theory state that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results shows that such behavior is shared by other functions of independent random variables.

Concentration inequalities can be sorted according to how much information about the random variable is needed in order to use them.

Markov's inequality

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Markov's inequality requires only the following information on a random variable X:

  • X accepts only non-negative values.
  • Its expected value \textrm{E}[X] is bounded.

Then, for every constant a > \textrm{E}[X]:

\Pr(X \geq a) \leq \frac{\textrm{E}(X)}{a}.

Markov's inequality extends to a strictly increasing and non-negative function \Phi:

\Pr(X \geq a) = \Pr(\Phi (X) \geq \Phi (a)) \leq \frac{\textrm{E}(\Phi(X))}{\Phi (a)}.

Chebyshev's inequality

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Chebyshev's inequality requires the following information on a random variable X:

  • The expected value \textrm{E}[X] is bounded.
  • The variance \operatorname{Var}[X] = \operatorname{E}[(X - \operatorname{E}[X] )^2] is bounded.

Then, for every constant a > 0:

\Pr(|X-\operatorname{E}[X]| \geq a) \leq \frac{\operatorname{Var}[X]}{a^2},

or equivalently:

\Pr(|X-\operatorname{E}[X]| \geq a\cdot \operatorname{Std}[X]) \leq \frac{1}{a^2}.

Chebyshev's inequality can be seen as a special case of the generalized Markov's inequality when \Phi = x^2.

Chernoff bounds

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

The generic Chernoff bound[1]:63–65 requires only the moment generating function of X, defined as: M_X(t) := \mathbb{E}\!\left[e^{tX}\right]. Based on Markov's inequality, for every t>0:

\Pr(X \geq a) \leq \frac{\textrm{E}[e^{t\cdot X}]}{e^{t\cdot a}},

and for every t<0:

\Pr(X \leq a) \leq \frac{\textrm{E}[e^{t\cdot X}]}{e^{t\cdot a}}.

There are various Chernoff bounds for different distributions and different values of the parameter t.

Bounds on sums of independent variables

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

Let X_1,\dots,X_n be independent random variables such that, for all i:

a_i\leq X_i\leq b_i almost surely.
c_i := b_i-a_i
\forall i: c_i \leq C

Let S_n be their sum, E_n its expected value and V_n its variance:

S_n := \sum_{i=1}^n X_i
E_n := E[S_n] = \sum_{i=1}^n E[X_i]
V_n := Var[S_n] = \sum_{i=1}^n Var[X_i]

It is often interesting to bound the difference between the sum and its expected value. Several inequalities can be used.

1. Hoeffding's inequality says that:

\Pr\left[|S_n-E_n|>t\right] < 2 \exp \left(-\frac{2t^2}{\sum_{i=1}^n c_i^2} \right)< 2 \exp \left(-\frac{2t^2}{n C^2} \right)

2. The random variable S_n-E_n is a special case of a martingale, and S_0-E_0=0. Hence, Azuma's inequality can also be used and it yields a similar bound:

\Pr\left[|S_n-E_n|>t\right] < 2 \exp \left(-\frac{t^2}{2\sum_{i=1}^n c_i^2}\right)< 2 \exp \left(-\frac{t^2}{2 n C^2} \right)

This is a generalization of Hoeffding's since it can handle other types of martingales, as well as supermartingales and submartingales.

3. The sum function, S_n=f(X_1,\dots,X_n), is a special case of a function of n variables. This function changes in a bounded way: if variable i is changed, the value of f changes by at most b_i-a_i<C. Hence, McDiarmid's inequality can also be used and it yields a similar bound:

\Pr\left[|S_n-E_n|>t\right] < 2 \exp \left(-\frac{2t^2}{\sum_{i=1}^n c_i^2} \right)< 2 \exp \left(-\frac{2t^2}{n C^2} \right)

This is a different generalization of Hoeffding's since it can handle other functions besides the sum function, as long as they change in a bounded way.

4. Bennett's inequality offers some improvement over Hoeffding's when the variances of the summands are small compared to their almost-sure bounds C. It says that:

\Pr\left[|S_n-E_n| > t \right] \leq
\exp\left[ - \frac{V_n}{C^2} h\left(\frac{C t}{V_n} \right)\right], where h(u) = (1+u)\log(1+u)-u

5. Bernstein's inequalities says that:

\Pr\left[|S_n-E_n|>t\right] < 2 \exp \left(-\frac{t^2/2}{V_n + C\cdot t/3} \right)

This is a generalization of Hoeffding's since it can handle not only independent variables but also weakly-dependent variables.

6. Chernoff bounds have a particularly simple form in the case of sum of independent variables, since \textrm{E}[e^{t\cdot S_n}] = \prod_{i=1}^n {\textrm{E}[e^{t\cdot X_i}]}.

For example,[2] suppose the variables X_i satisfy X_i \geq E(X_i)-a_i-M, for 1 \leq i \leq n. Then we have lower tail inequality:

\Pr[S_n - E_n < -\lambda]\leq e^{-\frac{\lambda^2}{2(V_n+\sum_{i=1}^n a_i^2+M\lambda/3)}}

If X_i satisfies X_i \leq E(X_i)+a_i+M, we have upper tail inequality:

\Pr[S_n - E_n > \lambda]\leq e^{-\frac{\lambda^2}{2(V_n + \sum_{i=1}^n a_i^2+M\lambda/3)}}

If X_i are i.i.d., |X_i| \leq 1 and \sigma^2 is the variance of X_i, a typical version of Chernoff Inequality is:

\Pr[|S_n| \geq k\sigma]\leq 2e^{-k^2/4n} for 0 \leq k\leq 2\sigma

7. Similar bounds can be found in: Rademacher distribution#Bounds on sums

Asymptotic behavior of binomial distribution

If a random variable X follows the binomial distribution with parameter n and p. The probability of getting exactly k successes in n trials is given by the probability mass function

 f(k;n,p) = \Pr(K = k) = {n\choose k}p^k(1-p)^{n-k}.

Let S_n=\sum_{i=1}^n X_i and X_i's are i.i.d. Bernoulli random variables with parameter p. S_n follows the binomial distribution with parameter n and p. Central Limit Theorem suggests when n \to \infty, S_n is approximately normally distributed with mean np and variance np(1-p), and


    \lim_{n\to\infty} \Pr[ a\sigma <S_n- np < b\sigma] = \int_a^b \frac{1}{\sqrt{2\pi}}e^{-x^2/2} dx

For p=\frac{\lambda}{n}, where \lambda is a constant, the limit distribution of binomial distribution B(n,p) is the Poisson distribution P(\lambda)

Efron–Stein inequality

The Efron–Stein inequality (or influence inequality, or MG bound on variance) bounds the variance of a general function.

Suppose that X_1 \dots X_n, X_1' \dots X_n' are independent with X_i' and X_i having the same distribution for all i.

Let X = (X_1,\dots , X_n), X^{(i)} = (X_1, \dots , X_{i-1}, X_i',X_{i+1}, \dots , X_n). Then


\mathrm{Var}(f(X)) \leq \frac{1}{2} \sum_{i=1}^{n} E[(f(X)-f(X^{(i)}))^2].

Dvoretzky–Kiefer–Wolfowitz inequality

<templatestyles src="Module:Hatnote/styles.css"></templatestyles>

The Dvoretzky–Kiefer–Wolfowitz inequality bounds the difference between the real and the empirical cumulative distribution function.

Given a natural number n, let X1, X2, …, Xn be real-valued independent and identically distributed random variables with cumulative distribution function F(·). Let Fn denote the associated empirical distribution function defined by

F_n(x) = \frac1n \sum_{i=1}^n \mathbf{1}_{\{X_i\leq x\}},\qquad x\in\mathbb{R}.

So F(x) is the probability that a single random variable X is smaller than x, and F_n(x) is the average number of random variables that are smaller than x.

Then:

\Pr\Bigl(\sup_{x\in\mathbb R} \bigl(F_n(x) - F(x)\bigr) > \varepsilon \Bigr) \le e^{-2n\varepsilon^2}\qquad \text{for every }\varepsilon\geq\sqrt{\tfrac{1}{2n}\ln2},

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.