Pre-defined alphabets

Alphabets are used to map any kind of genetic data to values stored inside EggLib objects (Align, Container, and Site instances). The type of data can be single characters (e.g. DNA or proteins), or strings with free length (e.g. insertion/deletion polymorphisms), or symbolic strings (alleles note represented by their sequences), or integers (e.g microsatellites, or polymorphisms that have been encoded beforehand). The character case can be considered or not.

The alphabet determines which values are valid and can be considered as alleles in polymorphisms and which values should be treated as missing data, while any other values will be rejected and cause an error.

The class Alphabet (available in the global namespace), allows to define custom alphabets, while this module contains pre-defined alphabets. For many functions accepting a particular type of data, the use of the correct alphabet among this list is mandatory. Furthermore, data processing will be faster, especially with DNA sequences.

egglib.alphabets.DNA

Alphabet optimized for DNA sequences (case-insensitive).

egglib.alphabets.protein

Alphabet for amino acids (only upper case). Stop codons are supported as *.

egglib.alphabets.codons

Alphabet for codon triplets (only upper case).

egglib.alphabets.positive_infinite

Alphabet with all positive integer values.

egglib.alphabets.binary

Alphabet for binary data (0/1 integers).

egglib.alphabets.genepop

Alphabet matching the Genepop format.