Fixation index

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

Fixation index (FST) is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites. Developed as a special case of Wright's F-statistics, it is one of the most commonly used statistics in population genetics.

Definition

Two of the most commonly used definitions for FST at a given locus are based on the variance of allele frequencies between populations, and on the probability of Identity by descent.

If \bar{p} is the average frequency of an allele in the total population, \sigma^2_S is the variance in the frequency of the allele between different subpopulations, weighted by the sizes of the subpopulations, and \sigma^2_T is the variance of the allelic state in the total population, FST is defined as [1]

 F_{ST} = \frac{\sigma^2_S}{\sigma^2_T} = \frac{\sigma^2_S}{\bar{p}(1-\bar{p})}

Wright's definition illustrates that FST measures the amount of genetic variance that can be explained by population structure. This can also be thought of as the fraction of total diversity that is not a consequence of the average diversity within subpopulations, where diversity is measured by the probability that two randomly selected alleles are different, namely 2p(1-p). If the allele frequency in the ith population is p_i and the relative size of the ith population is c_i, then

 F_{ST} = \frac{\bar{p}(1-\bar{p})-\sum c_i p_i(1-p_i)}{\bar{p}(1-\bar{p})} = \frac{\bar{p}(1-\bar{p})- \overline{p(1-p)}}{\bar{p}(1-\bar{p})}

Alternatively,[2]

 F_{ST} = \frac{f_0-\bar{f}}{1-\bar{f}}

where f_0 is the probability of identity by descent of two individuals given that the two individuals are in the same subpopulation, and \bar{f} is the probability that two individuals from the total population are identical by descent. Using this definition, FST can be interpreted as measuring how much closer two individuals from the same subpopulation are, compared to the total population. If the mutation rate is small, this interpretation can be made more explicit by linking the probability of identity by descent to coalescent times: Let T0 and T denote the average time to coalescence for individuals from the same subpopulation and the total population, respectively. Then,

 F_{ST} \approx 1-\frac{T_0}{T}

This formulation has the advantage that the expected time to coalescence can easily be estimated from genetic data, which led to the development of various estimators for FST.

Estimation

In practice, none of the quantities used for the definitions can be easily measured. As a consequence, various estimators have been proposed. A particularly simple estimator applicable to DNA sequence data is:[3]

 F_{ST} = \frac{ \pi_\text{Between} - \pi_\text{Within} } { \pi_\text{Between} }

where  \pi_\text{Between} and  \pi_\text{Within} represent the average number of pairwise differences between two individuals sampled from different sub-populations ( \pi_\text{Between} ) or from the same sub-population ( \pi_\text{Within}). The average pairwise difference within a population can be calculated as the sum of the pairwise differences divided by the number of pairs. However, this estimator is biased when sample sizes are small or if they vary between populations. Therefore, more elaborate methods are used to compute FST in practice. Two of the most widely used procedures are the estimator by Weir & Cockerham (1984),[4] or performing an Analysis of molecular variance. A list of implementations is available at the end of this article.

Interpretation

This comparison of genetic variability within and between populations is frequently used in applied population genetics. The values range from 0 to 1. A zero value implies complete panmixis; that is, that the two populations are interbreeding freely. A value of one implies that all genetic variation is explained by the population structure, and that the two populations do not share any genetic diversity.

For idealized models such as Wright's finite island model, FST can be used to estimate migration rates. Under that model, the migration rate is

\hat{M}\approx\frac{1}{2}\left (\frac{1}{F_{ST}} -1  \right ) .

The interpretation of FST can be difficult when the data analyzed are highly polymorphic. In this case, the probability of identity by descent is very low and FST can have an arbitrarily low upper bound, which might lead to misinterpretation of the data. Also, strictly speaking FST is not a genetic distance, as it does not satisfy the triangle inequality. As a consequence new tools for measuring genetic differentiation continue being developed.

FST in humans

Autosomal genetic distances based on classical markers

In their study The History and Geography of Human Genes (1994), Cavalli-Sforza, Menozzi and Piazza provide some of the most detailed and comprehensive estimates of genetic distances between human populations, within and across continents. Their initial database contains 76,676 gene frequencies (using 120 blood polymorphisms), corresponding to 6,633 samples in different locations. By culling and pooling such samples, they restrict their analysis to 491 populations. They focus on aboriginal populations that were at their present location at the end of the 15th century when the great European migrations began.[5] When studying genetic difference at the world level, the number is reduced to 42 representative populations, aggregating subpopulations characterized by a high level of genetic similarity. For these 42 populations, Cavalli-Sforza and coauthors report bilateral distances computed from 120 alleles. Among this set of 42 world populations, the greatest genetic distance observed is between Mbuti Pygmies and Papua New Guineans, where the Fst distance is 0.4573, while the smallest genetic distance (0.0021) is between the Danish and the English. When considering more disaggregated data for 26 European populations, the smallest genetic distance (0.0009) is between the Dutch and the Danes, and the largest (0.0667) is between the Lapps and the Sardinians. The mean genetic distance among the 861 available pairs in the world population is 0.1338. Here are some Fst calculated by Cavalli-Sforza 1994 for some populations :

Fst (Cavalli 1994) W.African Berber Indian Iranian Near Eastern Japanese Basque Lapp Sardinian Danish English Greek Italian
W.African 0 1642 1748 1796 1454 2252 1299 1689 2062 1459 1487 1356 1794
Berber 1642 0 497 408 263 1707 392 736 619 313 273 429 315
Indian 1748 497 0 154 229 718 418 459 449 293 280 272 261
Iranian 1796 408 154 0 158 1059 285 423 314 179 197 70 133
Near Eastern 1454 263 229 158 0 1056 246 423 329 238 236 129 208
Japanese 2252 1707 718 1059 1056 0 1481 947 1558 1176 1244 1175 1145
Basque 1299 392 418 285 246 1481 0 629 348 184 119 231 141
Lapp 1689 736 459 423 423 947 629 0 667 334 404 308 339
Sardinian 2062 619 449 314 329 1558 348 667 0 348 340 190 221
Danish 1459 313 293 179 238 1176 184 334 348 0 21 191 72
English 1487 273 280 197 236 1244 119 404 340 21 0 204 51
Greek 1356 429 272 70 129 1175 231 308 190 191 204 0 77
Italian 1794 315 261 133 208 1145 141 339 221 72 51 77 0

Autosomal genetic distances based on SNPs

More recently, the International HapMap Project estimated FST for three human populations using SNP data. Across the autosomes, FST was estimated to be 0.12. The significance of this FST value in humans is contentious. As an FST of zero indicates no divergence between populations, whereas an FST of one indicates complete isolation of populations, Anthropologists often cite Lewontin's 1972 work which came to a similar value and interpreted this number as meaning there was little biological differences between human races.[6] On the other hand, while an FST value of 0.12 is lower than that found between populations of many other species, Henry Harpending argued that this value implies on a world scale a "kinship between two individuals of the same human population is equivalent to kinship between grandparent and grandchild or between half siblings". In fact, the formulas derived in Harpending's paper in the 'Kinship in a subdivided population' section imply that two unrelated individuals of the same race have a higher coefficient of kinship (0.125) than an individual and their mixed race half-sibling (0.109).[7]

Intercontinental autosomal genetic distances based on SNPs[8]
Europe (CEU) Sub-Saharan Africa (Yoruba) East-Asia (Japanese)
Sub-Saharan Africa (Yoruba) 0.153
East-Asia (Japanese) 0.111 0.190
East-Asia (Chinese) 0.110 0.192 0.007
Intra-European/mediterranean autosomal genetic distances based on SNPs[8][9]
Italians Palestinians Swedish Finns Spanish Germans Russians French Greeks
Palestinians 0.0064
Swedish 0.0064-0.0090 0.0191
Finns 0.0130-0.0230 0.0050-0.0110
Spanish 0.0010-0.0050 0.0101 0.0040-0055 0.0110-0.0170
Germans 0.0029-0.0080 0.0136 0.0007-0.0010 0.0060-0.0130 0.0015-0.0030
Russians 0.0088-0.0120 0.0202 0.0030-0.0036 0.0060-0.0120 0.0070-0.0079 0.0030-0.0037
French 0.0030-0.0050 0.0020 0.0080-0.0150 0.0010 0.0010 0.0050
Greeks 0.0000 0.0057 0.0084 0.0035 0.0039 0.0108

Programs for calculating FST

Modules for calculating FST

References

  1. Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. Cavalli-Sforza et al., 1994, p. 24
  6. Lua error in package.lua at line 80: module 'strict' not found.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. 8.0 8.1 Lua error in package.lua at line 80: module 'strict' not found., see table
  9. Lua error in package.lua at line 80: module 'strict' not found., see table
  10. Lua error in package.lua at line 80: module 'strict' not found.
  11. Lua error in package.lua at line 80: module 'strict' not found.

Further reading

  • Evolution and the Genetics of Populations Volume 2: the Theory of Gene Frequencies, pg 294–295, S. Wright, Univ. of Chicago Press, Chicago, 1969
  • A haplotype map of the human genome, The International HapMap Consortium, Nature 2005

External links