# Empirical distribution function

In statistics, the **empirical distribution function** is the distribution function associated with the empirical measure of the sample. This cumulative distribution function is a step function that jumps up by 1/*n* at each of the *n* data points. The empirical distribution function estimates the cumulative distribution function underlying of the points in the sample and converges with probability 1 according to the Glivenko–Cantelli theorem. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.

## Contents

## Definition

Let (*x*_{1}, …, *x*_{n}) be independent, identically distributed real random variables with the common cumulative distribution function *F*(*t*). Then the **empirical distribution function** is defined as ^{[1]}^{[2]}

where is the indicator of event *A*. For a fixed *t*, the indicator is a Bernoulli random variable with parameter *p* = *F*(*t*), hence is a binomial random variable with mean *nF*(*t*) and variance . This implies that is an unbiased estimator for *F*(*t*).

However, in some textbooks,^{[3]}^{[4]} the definition is given as

## Asymptotic properties

Since the ratio (*n*+1) / *n* approaches 1 as *n* goes to infinity, the asymptotic properties of the two definitions that are given above are the same.

By the strong law of large numbers, the estimator converges to *F*(*t*) as almost surely, for every value of *t*:^{[1]}

thus the estimator is consistent. This expression asserts the pointwise convergence of the empirical distribution function to the true cumulative distribution function. There is a stronger result, called the Glivenko–Cantelli theorem, which states that the convergence in fact happens uniformly over *t*:^{[5]}

The sup-norm in this expression is called the Kolmogorov–Smirnov statistic for testing the goodness-of-fit between the empirical distribution and the assumed true cumulative distribution function *F*. Other norm functions may be reasonably used here instead of the sup-norm. For example, the L²-norm gives rise to the Cramér–von Mises statistic.

The asymptotic distribution can be further characterized in several different ways. First, the central limit theorem states that *pointwise*, has asymptotically normal distribution with the standard rate of convergence:^{[1]}

This result is extended by the Donsker’s theorem, which asserts that the *empirical process* , viewed as a function indexed by , converges in distribution in the Skorokhod space to the mean-zero Gaussian process , where *B* is the standard Brownian bridge.^{[5]} The covariance structure of this Gaussian process is

The uniform rate of convergence in Donsker’s theorem can be quantified by the result known as the Hungarian embedding:^{[6]}

Alternatively, the rate of convergence of can also be quantified in terms of the asymptotic behavior of the sup-norm of this expression. Number of results exist in this venue, for example the Dvoretzky–Kiefer–Wolfowitz inequality provides bound on the tail probabilities of :^{[6]}

In fact, Kolmogorov has shown that if the cumulative distribution function *F* is continuous, then the expression converges in distribution to , which has the Kolmogorov distribution that does not depend on the form of *F*.

Another result, which follows from the law of the iterated logarithm, is that ^{[6]}

and

## See also

- Càdlàg functions
- Dvoretzky–Kiefer–Wolfowitz inequality
- Empirical probability
- Empirical process
- Frequency (statistics)
- Kaplan–Meier estimator for censored processes
- Survival function
- Distribution fitting

## References

- ↑
^{1.0}^{1.1}^{1.2}van der Vaart, A.W. (1998).*Asymptotic statistics*. Cambridge University Press. p. 265. ISBN 0-521-78450-6.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles> - ↑ PlanetMath
- ↑ Coles, S. (2001)
*An Introduction to Statistical Modeling of Extreme Values*. Springer, p. 36, Definition 2.4. ISBN 978-1-4471-3675-0. - ↑ Madsen, H.O., Krenk, S., Lind, S.C. (2006)
*Methods of Structural Safety*. Dover Publications. p. 148-149. ISBN 0486445976 - ↑
^{5.0}^{5.1}van der Vaart, A.W. (1998).*Asymptotic statistics*. Cambridge University Press. p. 266. ISBN 0-521-78450-6.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles> - ↑
^{6.0}^{6.1}^{6.2}van der Vaart, A.W. (1998).*Asymptotic statistics*. Cambridge University Press. p. 268. ISBN 0-521-78450-6.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>

## Further reading

- Shorack, G.R.; Wellner, J.A. (1986).
*Empirical Processes with Applications to Statistics*. New York: Wiley. ISBN 0-471-86725-X.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>

## External links

Wikimedia Commons has media related to .Empirical distribution functions |