Dvoretzky–Kiefer–Wolfowitz inequality

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz inequality predicts how close an empirically determined distribution function will be to the distribution function from which the empirical samples are drawn. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved^[1] the inequality with an unspecified multiplicative constant C in front of the exponent on the right-hand side. In 1990, Pascal Massart proved the inequality with the sharp constant C = 1, ^[2] confirming a conjecture due to Birnbaum and McCarty.^[3]

The DKW inequality

Given a natural number n, let X₁, X₂, …, X_n be real-valued independent and identically distributed random variables with distribution function F(·). Let F_n denote the associated empirical distribution function defined by

F_n(x) = \frac1n \sum_{i=1}^n \mathbf{1}_{\{X_i\leq x\}},\qquad x\in\mathbb{R}.

The Dvoretzky–Kiefer–Wolfowitz inequality bounds the probability that the random function F_n differs from F by more than a given constant ε > 0 anywhere on the real line. More precisely, there is the one-sided estimate

\Pr\Bigl(\sup_{x\in\mathbb R} \bigl(F_n(x) - F(x)\bigr) > \varepsilon \Bigr) \le e^{-2n\varepsilon^2}\qquad \text{for every }\varepsilon\geq\sqrt{\tfrac{1}{2n}\ln2},

which also implies a two-sided estimate ^[4]

\Pr\Bigl(\sup_{x\in\mathbb R} |F_n(x) - F(x)| > \varepsilon \Bigr) \le 2e^{-2n\varepsilon^2}\qquad \text{for every }\varepsilon>0.

This strengthens the Glivenko–Cantelli theorem by quantifying the rate of convergence as n tends to infinity. It also estimates the tail probability of the Kolmogorov–Smirnov statistic. The inequalities above follow from the case where F corresponds to be the uniform distribution on [0,1] in view of the fact^[5] that F_n has the same distributions as G_n(F) where G_n is the empirical distribution of U₁, U₂, …, U_n where these are independent and Uniform(0,1), and noting that

\sup_{x\in\mathbb R} |F_n(x) - F(x)|\stackrel{d}{=} \sup_{x \in \mathbb R} | G_n (F(x)) - F(x) | \le \sup_{0 \le t \le 1} | G_n (t) -t | ,

with equality if and only if F is continuous.

References

↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

[Dvoretzky-1] Lua error in package.lua at line 80: module 'strict' not found.

[Massart-2] Lua error in package.lua at line 80: module 'strict' not found.

[3] Lua error in package.lua at line 80: module 'strict' not found.

[Kosorok-4] Lua error in package.lua at line 80: module 'strict' not found.

[Shorack-5] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

[3]

[4]

[5]

Dvoretzky–Kiefer–Wolfowitz inequality

The DKW inequality

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools