Zipf–Mandelbrot law

Zipf–Mandelbrot
Parameters	(integer); (real); (real)
Support
pmf
CDF
Mean
Mode
Entropy

In probability theory and statistics, the Zipf–Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it.

The probability mass function is given by:

f(k;N,q,s)=\frac{1/(k+q)^s}{H_{N,q,s}}

where $H_{N,q,s}$ is given by:

H_{N,q,s}=\sum_{i=1}^N \frac{1}{(i+q)^s}

which may be thought of as a generalization of a harmonic number. In the formula, $k$ is the rank of the data, and $q$ and $s$ are parameters of the distribution. In the limit as $N$ approaches infinity, this becomes the Hurwitz zeta function $\zeta(s,q)$ . For finite $N$ and $q=0$ the Zipf–Mandelbrot law becomes Zipf's law. For infinite $N$ and $q=0$ it becomes a Zeta distribution.

Applications

The distribution of words ranked by their frequency in a random text corpus is approximated by a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with s=1 does not converge, while the Zipf-Mandelbrot generalization with s>1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf-Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register.^[1]

In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law.^[2]

Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandelbrot distributions.^[3]

Notes

↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

References

Lua error in package.lua at line 80: module 'strict' not found. Reprinted as
- Lua error in package.lua at line 80: module 'strict' not found.
Lua error in package.lua at line 80: module 'strict' not found.
Lua error in package.lua at line 80: module 'strict' not found.

External links

[1] Lua error in package.lua at line 80: module 'strict' not found.

[2] Lua error in package.lua at line 80: module 'strict' not found.

[3] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

[3]

Zipf–Mandelbrot law

Contents

Applications

Notes

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Parameters	$N \in \{1,2,3\ldots\}$ (integer) $q \in [0;\infty)$ (real) $s>0\,$ (real)
Support	$k \in \{1,2,\ldots,N\}$
pmf	$\frac{1/(k+q)^s}{H_{N,q,s}}$
CDF	$\frac{H_{k,q,s}}{H_{N,q,s}}$
Mean	$\frac{H_{N,q,s-1}}{H_{N,q,s}}-q$
Mode	$1\,$
Entropy	$\frac{s}{H_{N,q,s}}\sum_{k=1}^N\frac{\ln(k + q)}{(k + q)^s} +\ln(H_{N,q,s})$