Diversity statistics¶
In the module stats, a number of tools are provided to
compute diversity statistics out of Site
or Align
instances. Some statistics are applicable to individual sites, some
to sets of sites, and some to phased sequences alignments. Note that
the objects may indifferently contain nucleotide sequences, protein
sequences, microsatellite alleles encoded (or not) as allele length, or
any arbitrary representation of allelic diversity.
The alphabets define lists of alleles and their representation, but have not influence regarding what statistics can be computed or not. What is important to note that EggLib will compute any statistic you request out of your data, even if it is meaningless. Special attention should be granted to statistics requiring a phase, since you can easily load unphased data to objects that can be used to compute those statistics.
In many cases, not computable statistics are returned as None
, but
this is only when they are technically not computable (due to missing
data or unvailability of a specific feature such as outgroup sequences
or subpopulations).
In the sections of this chapter, we will present statistics available
in the stats
module. Statistics will be grouped by families (a
family of statistics being a group of statistics that require the same
type of data and the same kind of information). Most of the statistics
are computed by stats.ComputeStats
(see this tutorial
section for an introduction) and the others by
other functions available in the same module.
Outgroup¶
Some of the statistics require an outgroup to be computed. The outgroup
should be included in the analysed dataset (Site
or
Align
instance) and identified by the means of a
Structure
instance. There might be more than one outgroup
samples. The ougroup information will be used to identify the ancestral
variant (that is, the one which is shared with the outgroup) if the
outgroup has one of the alleles present in the main sample (the
ingroup), this allele will be considered to be ancestral. If there are
several outgroup samples, all of them are expected to have the same
allele (if they are non-missing at this position). If the outgroup has
an allele not found in the outgroup, or if the outgroup contains several
alleles, then the site will be considered not orientable and won’t be
used for statistics requiring an outgroup. Statistics not requiring an
outgroup will be computed normally, though.
Population structure¶
Many statistics require that several populations are present, some
require that an individual structure is defined, and one statistic
(FisctWC
) clusters of populations in addition to populations and
individuals. Like the outgroup, the structure of samples is described
by Structure
instances (see here for an
introduction). If the appropriate level of structure is not defined in
the Structure
provided to the class or function computing
statistics (or if no Structure
is provided), the concerned
statistics will be None
.
Here is the list of families of statistics that are described in the following sections: