Pre-defined alphabets¶

Alphabets are used to map any kind of genetic data to values stored inside EggLib objects (Align, Container, and Site instances). The type of data can be single characters (e.g. DNA or proteins), or strings with free length (e.g. insertion/deletion polymorphisms), or symbolic strings (alleles note represented by their sequences), or integers (e.g microsatellites, or polymorphisms that have been encoded beforehand). The character case can be considered or not.

The alphabet determines which values are valid and can be considered as alleles in polymorphisms and which values should be treated as missing data, while any other values will be rejected and cause an error.

The class Alphabet (available in the global namespace), allows to define custom alphabets, while this module contains pre-defined alphabets. For many functions accepting a particular type of data, the use of the correct alphabet among this list is mandatory. Furthermore, data processing will be faster, especially with DNA sequences.

egglib.alphabets.DNA¶: Alphabet optimized for DNA sequences (case-insensitive).

egglib.alphabets.protein¶: Alphabet for amino acids (only upper case). Stop codons are supported as *.

egglib.alphabets.codons¶: Alphabet for codon triplets (only upper case).

egglib.alphabets.positive_infinite¶: Alphabet with all positive integer values.

egglib.alphabets.binary¶: Alphabet for binary data (0/1 integers).

egglib.alphabets.genepop¶: Alphabet matching the Genepop format.