Heavy-tailed distribution

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded:[1] that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.

There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class.

There is still some discrepancy over the use of the term heavy-tailed. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their power moments finite; and some others to those distributions that do not have a finite variance. The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such as log-normal that possess all their power moments, yet which are generally acknowledged to be heavy-tailed. (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)

Definitions

Definition of heavy-tailed distribution

The distribution of a random variable X with distribution function F is said to have a heavy right tail if[1]


\lim_{x \to \infty} e^{\lambda x}\Pr[X>x] = \infty \quad \mbox{for all } \lambda>0.\,

This is also written in terms of the tail distribution function

\overline{F}(x) \equiv \Pr[X>x] \,

as


\lim_{x \to \infty} e^{\lambda x}\overline{F}(x) = \infty \quad \mbox{for all } \lambda>0.\,

This is equivalent to the statement that the moment generating function of F, MF(t), is infinite for all t > 0.[2]

The definitions of heavy-tailed for left-tailed or two tailed distributions are similar.

Definition of long-tailed distribution

The distribution of a random variable X with distribution function F is said to have a long right tail[1] if for all t > 0,


\lim_{x \to \infty} \Pr[X>x+t|X>x] =1, \,

or equivalently


\overline{F}(x+t) \sim \overline{F}(x) \quad \mbox{as } x \to \infty. \,

This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level: if you know the situation is good, it is probably better than you think.

All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.

Subexponential distributions

Subexponentiality is defined in terms of convolutions of probability distributions. For two independent, identically distributed random variables  X_1,X_2 with common distribution function F the convolution of F with itself, F^{*2} is defined, using Lebesgue–Stieltjes integration, by:


\Pr[X_1+X_2 \leq x] = F^{*2}(x) = \int_{- \infty}^\infty F(x-y)\,dF(y).

The n-fold convolution F^{*n} is defined in the same way. The tail distribution function \overline{F} is defined as \overline{F}(x) = 1-F(x).

A distribution F on the positive half-line is subexponential [1][3][4] if


\overline{F^{*2}}(x) \sim 2\overline{F}(x) \quad \mbox{as } x \to \infty.

This implies[5] that, for any n \geq 1,


\overline{F^{*n}}(x) \sim n\overline{F}(x) \quad \mbox{as } x \to \infty.

The probabilistic interpretation[5] of this is that, for a sum of n independent random variables X_1,\ldots,X_n with common distribution F,


\Pr[X_1+ \cdots +X_n>x] \sim \Pr[\max(X_1, \ldots,X_n)>x] \quad \text{as } x \to \infty.

This is often known as the principle of the single big jump[6] or catastrophe principle.[7]

A distribution F on the whole real line is subexponential if the distribution F I([0,\infty)) is.[8] Here I([0,\infty)) is the indicator function of the positive half-line. Alternatively, a random variable X supported on the real line is subexponential if and only if X^+ = \max(0,X) is subexponential.

All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.

Common heavy-tailed distributions

All commonly used heavy-tailed distributions are subexponential.[5]

Those that are one-tailed include:

Those that are two-tailed include:

Relationship to fat-tailed distributions

A fat-tailed distribution is a distribution for which the probability density function, for large x, goes to zero as a power x^{-a}. Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed. Some distributions however have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the log-normal distribution. Many other heavy-tailed distributions such as the log-logistic and Pareto distribution are however also fat-tailed.

Estimating the tail-index

There are parametric (see Embrechts et al.[5]) and non-parametric (see, e.g., Novak[13]) approaches to the problem of the tail-index estimation.

To estimate the tail-index using the parametric approach, some authors employ GEV distribution or Pareto distribution; they may apply the maximum-likelihood estimator (MLE).

Pickand's tail-index estimator

With (X_n , n \geq 1) a random sequence of independent and same density function F \in D(H(\xi)), the Maximum Attraction Domain[14] of the generalized extreme value density  H , where \xi \in \mathbb{R}. If \lim_{n\to\infty} k(n) = \infty  and \lim_{n\to\infty} \frac{k(n)}{n}= 0, then the Pickands tail-index estimation is[5][14]


\xi^{Pickands}_{(k(n),n)} =\frac{1}{\ln 2} \ln \left(  \frac{X_{(n-k(n)+1,n)} - X_{(n-2k(n)+1,n)}}{X_{(n-2k(n)+1,n)} - X_{(n-4k(n)+1,n)}}\right)

where X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots  ,X_{n}\right). This estimator converge in probability to \xi.

Hill's tail-index estimator

With (X_n , n \geq 1) a random sequence of independent and same density function F \in D(H(\xi)), the Maximum Attraction Domain of the generalized extreme value density  H , where \xi \in \mathbb{R}. If \lim_{n\to\infty} k(n) = \infty  and \lim_{n\to\infty} \frac{k(n)}{n}= 0, then the Hill tail-index estimator is[15]


\xi^{Hill}_{(k(n),n)} = \left(\frac{1}{k(n)} \sum_{i=n-k(n)+1}^{n} \ln(X_{(i,n)}) - \ln (X_{(n-k(n)+1,n)})\right)^{-1},

where X_{(n-k(n)+1,n)}=\min \left(X_{n-k(n)+1},\ldots  ,X_{n}\right). This estimator converge in probability to \xi. Under certain assumptions it is asymptotically normally distributed.[5]

Ratio estimator of the tail-index

The ratio estimator (RE-estimator) of the tail-index was introduced by Goldie and Smith.[16] It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter".

A comparison of Hill-type and RE-type estimators can be found in Novak.[13]

Software

  • aest, C tool for estimating the heavy-tail index.[17]

See also

References

  1. 1.0 1.1 1.2 1.3 Lua error in package.lua at line 80: module 'strict' not found.
  2. Rolski, Schmidli, Scmidt, Teugels, Stochastic Processes for Insurance and Finance, 1999
  3. V. P. Chistyakov, A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes, Theory of Probability and Its Applications 1964 https://www.researchgate.net/publication/242637603_A_Theorem_on_Sums_of_Independent_Positive_Random_Variables_and_Its_Applications_to_Branching_Random_Processes
  4. J.L. Teugels, The Class of Subexponential Distributions, Annals of Probability 1975 http://projecteuclid.org/download/pdf_1/euclid.aop/1176996225
  5. 5.0 5.1 5.2 5.3 5.4 5.5 Lua error in package.lua at line 80: module 'strict' not found. Cite error: Invalid <ref> tag; name "Embrechts" defined multiple times with different content Cite error: Invalid <ref> tag; name "Embrechts" defined multiple times with different content
  6. Lua error in package.lua at line 80: module 'strict' not found.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found.
  9. Lua error in package.lua at line 80: module 'strict' not found.
  10. Lua error in package.lua at line 80: module 'strict' not found.
  11. Lua error in package.lua at line 80: module 'strict' not found.
  12. Lua error in package.lua at line 80: module 'strict' not found.
  13. 13.0 13.1 Lua error in package.lua at line 80: module 'strict' not found.
  14. 14.0 14.1 Lua error in package.lua at line 80: module 'strict' not found.
  15. Hill B.M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Statist., v. 3, 1163-1174.
  16. Goldie C.M., Smith R.L. (1987) Slow variation with remainder: theory and applications. Quart. J. Math. Oxford, v. 38, 45--71.
  17. Lua error in package.lua at line 80: module 'strict' not found.