Allele size statistics¶

The following statistics make use of the allele size. They are available from ComputeStats() (all methods), but require that an alphabet with alleles coded as integers is provided. The allele values are interpreted as allele sizes (even if negative values are found: in that case it is assumed that allele sizes are shifted).

Code

Definition

Equation

V

Allele size variance

(1)

Ar

Allele range

(2)

M

Garza and Williamson's M

(3)

Rst

Slatkin's $$R_{ST}$$

(4)

If $$k$$ is the number of alleles, $$A_i$$ if the size of allele $$i$$, $$p_i$$ is the frequency of this allele and $$n = \sum p_i$$ is the number of samples, the allele size variance is computed as a sample variance:

(1)$V = \frac{n}{n-1} \left[ \frac{1}{n} \sum_i^k p_i {A_i}^2 - \left( \frac{1}{n} \sum_i^k p_i A_i \right) ^2 \right]$

The allele range is simply:

(2)$A_R = \max(A_i) - \min(A_i)$

Garza and Williamson’s $$M$$ (Mol. Ecol. 2001 10:305-318) is computed as:

(3)$M = \frac{k}{A_R+1}$

Finally, Slatkin’s $$R_{ST}$$ (Genetics 1995 139:457-462) is computed as shown below, considering only samples from populations with at least 2 samples. Note that when computing the value for several sites the value of $$R_{ST}$$ is computed as the ratio of the sums of the different terms (rather than the average of the per-site values). $$\bar{V}$$ is the average of within-population allele size variance.

(4)$R_{ST} = 1 - \frac{\bar{V}} {V}$