Sequence manipulation tools¶

This module provides a number of standalone tools for manipulating sequences.

`egglib.tools.ReadingFrame`([frame, ...])	Handle reading frame positions.
`egglib.tools.to_codons`(src[, frame])	Converts a DNA object to the codon alphabet.
`egglib.tools.to_bases`(src)	Converts a codon object to the DNA alphabet.
`egglib.tools.concat`(align1, align2, ...[, ...])	Concatenates sequence alignments.
`egglib.tools.ungap`(align, freq[, gaps])	Remove sites with too many gaps.
`egglib.tools.ungap_all`(obj[, gaps])	Remove all gaps from an object.
`egglib.tools.rc`(seq)	Reverse-complement a DNA sequence.
`egglib.tools.compare`(seq1, seq2)	Compare two sequences.
`egglib.tools.regex`(query[, both_strands])	Turn a DNA sequence into a regular expression.
`egglib.tools.motif_iter`(subject, query[, ...])	Iterate over hits of a given query.
`egglib.tools.translate`(seq[, code, ...])	Translates coding nucleotide sequences to proteins.
`egglib.tools.Translator`([code])	Class to translate nucleotide sequences to proteins.
`egglib.tools.orf_iter`(sequence[, code, ...])	Iterate over open reading frames.
`egglib.tools.longest_orf`(sequence[, code, ...])	Detect the longest open reading frame.
`egglib.tools.backalign`(nucl, aln[, code, ...])	Align coding sequence based on the protein alignment.
`egglib.tools.trailing_stops`(align[, action, ...])	Process stop codons at the end of the sequences.
`egglib.tools.iter_stops`(src[, code])	Iterate over stop codons.
`egglib.tools.has_stop`(src[, code])	Test if there is at least one stop codon.

class egglib.tools.ReadingFrame(frame=None, keep_truncated=False)[source]¶

Handle reading frame positions. The reading frame positions can be loaded as constructor argument or using the method process(). By default, build an instance with no exons.

Parameters:

frame –
the reading frame specification must be a sequence of (start, stop[, codon_start]) pairs or triplets where start and stop give the limits of an exon, such that sequence[start:stop] returns the exon sequence, and codon_start, if specified, can be:
- 1 if the first position of the exon is the first position of a codon (e.g. ATG ATG),
- 2 if the first position of the segment is the second position of a codon (e.g. TG ATG),
- 3 if the first position of the segment is the third position a of codon (e.g. G ATG),
- None if the reading frame is continuing the previous exon.
keep_truncated – if True, codons that are truncated (either because the reading frame is 5’, internally, or 3’ partial, or if the number of bases is not a multiple of three) are not skipped (missing positions of truncated codons are replaced by None).

If codon_start of the first segment is None, 1 will be assumed. If codon_start of any non-first segment is not None, the reading frame is supposed to be interupted. This means that if any codon was not completed at the end of the previous exon, it will assumed to be assumed incomplete, and the first codon of the current codon will be incomplete as well.

Attributes

`num_codons`	Number of codons.
`num_exon_bases`	Number of bases in exons.
`num_exons`	Number of exons.
`num_needed_bases`	Number of bases needed to apply this reading frame.
`num_tot_bases`	Number of bases of the reading frame.

Methods

`codon_bases`(codon)	Give the position of the three bases of a given codon.
`codon_index`(base)	Find the codon in which a given base falls.
`codon_position`(base)	Tell the position of a base within the codon in which it falls.
`exon_index`(base)	Find the exon in which a given base falls.
`iter_codons`()	Iterate over codons.
`iter_exon_bounds`()	Iterate over exons.
`process`(frame[, keep_truncated])	Reset instance with a new reading frame.

codon_bases(codon)[source]¶

Give the position of the three bases of a given codon. One or two positions (but never the middle one alone) will be None if the codon is truncated (beginning/end of an exon without coverage of the previous/next one). If the codon index is out of range, return None.

Parameters:: codon – any codon index.
Returns:: A tuple with the three base positions (containing potentially one or two None) or None if the codon index is out of range.

codon_index(base)[source]¶

Find the codon in which a given base falls.

Parameters:: base – any base index.
Returns:: The index of the corresponding codon, or None if the base does not fall in any codon.

codon_position(base)[source]¶

Tell the position of a base within the codon in which it falls.

Parameters:: base – any base index.
Returns:: The index of the base in the codon (0, 1 or 3), or None if the base does not fall in any codon.

exon_index(base)[source]¶

Find the exon in which a given base falls.

Parameters:: base – any base index.
Returns:: The index of the corresponding exon, or None if the base does not fall in any exon.

iter_codons()[source]¶: Iterate over codons. This iterator returns (first, second, third) tuples of the positions of the three bases of each codon. Positions might be None if keep_truncated has been set to True.

iter_exon_bounds()[source]¶: Iterate over exons. This iterator returns (start, stop) tuples of the positions of the limits of each exon.

property num_codons¶: Number of codons. Truncated codons are included.

property num_exon_bases¶: Number of bases in exons.

property num_exons¶: Number of exons.

property num_needed_bases¶: Number of bases needed to apply this reading frame. Minimum size of a sequence to be consistent with this reading frame. In practice, the value equals to the end of the last exon or zero if there are no exons. If the reading frame is used with a shorter sequence, it can lead to errors.

property num_tot_bases¶: Number of bases of the reading frame. All introns and exons are included, starting from the start of the first exon up to end of the last one.

process(frame, keep_truncated=False)[source]¶: Reset instance with a new reading frame. All previously loaded data are discarded. The arguments of this method are identical to the constructor.

egglib.tools.to_codons(src, frame=None)[source]¶

Converts a DNA object to the codon alphabet. In the output alignment, there is one site per codon. The number of provided sites must be a multiple of 3.

Parameters:

aln – input alignment as an Align or Container instance. The alphabet must be alphabets.DNA.
frame – a tools.ReadingFrame instance providing the exon positions in the correct frame. By default, a non-segmented frame covering all sequences is assumed (in case the provided alignment is the coding region; in such case the length must be a multiple of 3).

Returns:

A new instance of type matching the input instance. The alphabet will be alphabets.codons.

egglib.tools.to_bases(src)[source]¶

Converts a codon object to the DNA alphabet. In the output object, there are three sites per codon.

Parameters:: aln – input object as an Align or Container instance. The alphabet must be alphabets.codons.
Returns:: A new instance of type matching the input instance. The alphabet will be alphabets.DNA.

egglib.tools.concat(align1, align2, ..., spacer=0, ch='?', group_check=True, no_missing=False, ignore_names=False, dest=None)[source]¶

Concatenates sequence alignments. A unique Align is generated. All different sequences from all passed alignments are represented in the final alignment. Sequences whose name match are concatenated. In case several sequences have the same name in a given segment, the first one is considered and others are discarded. In case a sequence is missing for a particular segment, a stretch of non-varying characters is inserted to replace the unknown sequence.

All options (excluding the alignements to be concatenated) must be specified as keyword arguments, otherwise they will be treated as part of the list of alignments.

Parameters:

align1
align2 – two or more Align instances (their order is used for concatenation). It is not allowed to specify them using the keyword syntax. All instances must be configured to use the same alphabet.
spacer – length of unsequenced stretches (represented by non-varying characters) between concatenated alignments. If spacer is a positive integer, the length of all stretches will be identical. If spacer is an iterable containing integers, each specifying the interval between two consecutive alignments (if there are n alignments, spacer must be of length n-1).
ch – character to used for conserved stretches and for missing segments. This character must be valid for the alphabet considered.
group_check – if True, an exception will be raised in case of a mismatch between group labels of different sequence segments bearing the same name. Otherwise, the group labels of the first segment found will be used as group labels of the final sequence.
no_missing – if True, an exception will be raised in case the list of samples differs between Align instances. Then, the number of samples must always be the same and all samples must always be present (although it is possible that they consist in missing data only). Ignored if ignore_names is True.
ignore_names – don’t consider sample names and concatenate sequences based on they order in the instance. If used, the value of the option no_missing is ignored and the number of samples is required to be constant over alignments.
dest – an optional Align instance to recycle and to use to store the result. This instance is automatically reset, ignoring all data previously loaded. If this argument is not None, the function returns nothing and the passed instance is modified. Allows to recycle the same object in intensive applications.

Returns:

If dest is None, a new Align instance. If dest is None, this function returns None.

egglib.tools.ungap(align, freq, gaps=None)[source]¶

Remove sites with too many gaps. Generate a new Align instance containing all sequences of the provided alignment but with only those sites for which the frequency of alignment gaps is less than or equal to freq.

Parameters:

align – input alignment as an Align.
freq – maximum gap frequency in a site (if there are more gaps, the site is not included in the returned Align instance). This value is a relative frequency (included in the [0, 1] range).
gaps – the list of allelic value(s) representing gaps. The argument must be a list or tuple containing one gap allele value, or more than one if appropriate. All values must be valid alleles for the alignment alphabet. By default, use the - character for the DNA and protein alphabets, and all triplets containing at least one alignment gaps for the codon alphabet. There is no default for other alphabets.

Returns:

A new Align instance.

egglib.tools.ungap_all(obj, gaps=None)[source]¶

Remove all gaps from an object. Generate a Container instance containing all sequences of the provided object excluding any gaps.

Parameters:

obj – input object as an Align or a Container.
gaps – the list of allelic value(s) representing gaps. The argument must be a list or tuple containing one gap allele value, or more than one if appropriate. All values must be valid alleles for the input alphabet. By default, use the - character for the DNA and protein alphabets, and all triplets containing at least one alignment gaps for the codon alphabet. There is no default for other alphabets.

Returns:

A Container instance.

egglib.tools.rc(seq)[source]¶

Reverse-complement a DNA sequence.

Parameters:: seq – input nucleotide sequence, as a str, or a SequenceView.
Returns:: The reverse-complemented sequence (see details below).

The case of the provided sequence is preserved. Ambiguity characters are complemented according to the table shown here. Invalid characters characters raise a ValueError.

egglib.tools.compare(seq1, seq2)[source]¶

Compare two sequences. The comparison supports ambiguity characters (see here) such that partially overlapping ambiguity sets characters are not treated as different. For example, A and M are not treated as different, nor are M and R. Only IUPAC characters may be supplied.

Parameters:

seq1 – a DNA sequence (as a str or a SequenceView).
seq2 – another DNA sequence (idem).

Returns:

True if sequences have the same length and they either are identical or differ only by overlapping IUPAC characters, False otherwise.

egglib.tools.regex(query, both_strands=False)[source]¶

Turn a DNA sequence into a regular expression. The input sequence should contain IUPAC characters only (see here). Ambiguity characters will be teated as follows: an ambiguity will match either (1) itself, (2) one of the characters it defines, or (3) one of the ambiguity characters that define a subset of the characters it defines. For example, a M in the query will match an A, a C or a M, and a D will match all the following: A, G, T, R, W, K, and D. The fully degenerated N matches all four bases and all intermediate ambiguity characters and, finally, ? matches all allowed characters (including alignment gaps). Note - only matches itself.

Parameters:

query – a str containing IUPAC characters or a SequenceView.
both_strands – look for the query on both forward and reverse strands (by default, only on forward strand).

Returns:

A regular expression as a str expanding ambiguity characters to all compatible characters.

Result of this function can be used with the module re to locate occurrences of a motif or the position of a sequence as in:

>>> regex = egglib.tools.regex(query)
>>> for hit in re.finditer(regex, subject):
...     print(hit.start(), hit.end(), hit.group(0))

The returned regular expression includes upper-case characters, regardless of the case of input characters. To perform a case-insensitive search, use the re.IGNORECASE flag of the re module in regular expression searches (as in re.search(egglib.tools.regex(query), subject, re.IGNORECASE). Note that the regular expression is contained into a group if both_strands is False, and two groups otherwise (one for each strand).

egglib.tools.motif_iter(subject, query, mismatches=0, both_strands=False, case_independent=True, only_base=True)[source]¶

Iterate over hits of a given query. Return an iterator over hits of the provided query in the subject sequence. The query should only contain bases (including IUPAC ambiguity characters as listed in here, but excluding - and ?). Ambiguity characters are treated as described in regex() (the bottom line is that ambiguities in the query can only match identical or less degenerate ambiguities in the subject). Mismatches are allowed, and the iterator will first yield identical hits (if they exist), then hits with one mismatch, then hits with two mismatches, and so on.

Parameters:

subject – a DNA sequence as str or a SequenceView. The sequence may contain ambiguity characters. In principle, the subject sequence should contain IUPAC characters only, but this is not strictly enforced (non-IUPAC characters will always be treated as different of IUPAC characters).
query – a DNA sequence as str or a SequenceView, also containing IUPAC characters only.
mismatches – maximum number of mismatches.
both_strands – look for the query on both forward and reverse strands (by default, only on forward strand).
case_independent – if True, perform case-independent searches.
only_base – if True, never create a hit if the putative hit sequence contains an alignment gap (-) or a ? character, irrespective of the number of allowed mismatches.

Returns:

An iterator returning, for each hit, a (start, stop, strand, num) tuple where start and stop are the position of the hit such that subject[start:stop] returns the hit sequence, strand is the strand as + or -, and num is the number of mismatches of the hit.

Note

If a given query has a hit on both strands at the same position (this only happens if the sequence is a motif equal to its reverse-complement like AATCGATT), it is guaranteed that the only one hit will be returned for a given position (no twice on each strand). However, the value of the strand flag is not defined.

Warning

This function is designed to be efficient only if the number of mismatches is small (only a few). A combination of a large number of mismatches (more than 2 or 3) and a large query sequence (as early as 10 or 20), will take significant time to complete. More complex problems require the use of genuine pairwise ocal alignment tools.

egglib.tools.translate(seq, code=1, allow_alt=False, in_place=False)[source]¶

Translates coding nucleotide sequences to proteins. See the class tools.Translator for more details. This is a convenience method allowing to translate nucleotide sequences in a single call. For repetitive calls, direct use of Translator can be more efficient.

Parameters:

seq – input codon sequences. Accepted types are Align, Container, SequenceView or str (or also types compatible with str). If the input object is an Align or a Container, the alphabet must be alphabets.codons.
code – genetic code identifier (see below). Required to be an integer among the valid values. The default value is the standard genetic code.
allow_alt – a boolean telling whether alternative start (initiation) codons should be considered. If True, codons are translated as a methionine (M) if, and only if, there are among the alternative start codons for the considered genetic code and they appear at the first position for the sequence. If seq is an Align, leading gaps are ignored as long as they appear as multiples of 3 (fully missing codons).
in_place – place translated sequences in the provided Align or Container instance, overwriting initial data. Not allowed if seq is not of one of these two types. Replace the original alphabet by alphabets.protein. By default, return a new instance.

Returns:

Protein sequences as an Align or a Container if either of these types have been provided as seq, or as a string otherwise. If in_place has been specified, return None.

All genetic codes defined by the National Center for Biotechnology Information are supported and can be accessed using codes corresponding to the GenBank /trans_table qualifier. The codes are integers.

Identifier	Code
1	The Standard Code
2	The Vertebrate Mitochondrial Code
3	The Yeast Mitochondrial Code
4	The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
5	The Invertebrate Mitochondrial Code
6	The Ciliate, Dasycladacean and Hexamita Nuclear Code
9	The Echinoderm and Flatworm Mitochondrial Code
10	The Euplotid Nuclear Code
11	The Bacterial, Archaeal and Plant Plastid Code
12	The Alternative Yeast Nuclear Code
13	The Ascidian Mitochondrial Code
14	The Alternative Flatworm Mitochondrial Code
16	Chlorophycean Mitochondrial Code
21	Trematode Mitochondrial Code
22	Scenedesmus obliquus Mitochondrial Code
23	Thraustochytrium Mitochondrial Code
24	Rhabdopleuridae Mitochondrial Code
25	Candidate Division SR1 and Gracilibacteria Code
26	Pachysolen tannophilus Nuclear Code
27	Karyorelict Nuclear Code
28	Condylostoma Nuclear Code
29	Mesodinium Nuclear Code
30	Peritrich Nuclear Code
31	Blastocrithidia Nuclear Code
33	Cephalodiscidae Mitochondrial UAA-Tyr Code

Note that the following code identifiers do not exist: 7, 8, 15, 17, 18, 19, 20, and 32, as well as 0 and values above 33.

Reference: National Center for Biotechnology Information [http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi]

class egglib.tools.Translator(code=1)[source]¶

Class to translate nucleotide sequences to proteins.

Parameters:: code – genetic code identifier (see here). Required to be an integer among the valid values. The default value is the standard genetic code.

Methods

`translate_align`(src[, allow_alt, in_place])	Translate an `Align` instance.
`translate_codon`(codon)	Translate a single codon.
`translate_container`(src[, allow_alt, in_place])	Translate a `Container` instance.
`translate_sequence`(sequence[, allow_alt])	Translate a sequence.

translate_align(src, allow_alt=False, in_place=False)[source]¶

Translate an Align instance.

Parameters:

src – an Align containing codon sequences and using the alphabets.codons alphabet.
allow_alt – a boolean telling whether alternative start (initiation) codons should be considered. If True, codons are translated as a methionine (M) if they are among the alternative start codons for the considered genetic code and they appear at the first position for the considered sequence (excluding all triplets of gap symbols appearing at the 5’ end of the sequence).
in_place – place translated sequences into the original Align instance (this discards original data and replaces the original alphabet by the protein alphabet). By default, returns a new instance.

Returns:

By default, an original Align instance containing translated (protein) sequences. If in_place was True, return None.

translate_codon(codon)[source]¶

Translate a single codon. Translation is based on the genetic code defined at construction time. Alternative start codons are not supported.

Parameters:: codon – codon as a three-character string
Returns:: The one-letter amino acid code if the codon can be translated, or - if the codon is ---, or X in all other cases (including all cases with invalid nucleotides).

translate_container(src, allow_alt=False, in_place=False)[source]¶

Translate a Container instance.

Parameters:

align – a Container containing codons sequences.
allow_alt – a boolean telling whether alternative start (initiation) codons should be considered. If False, codons are translated as a methionine (M) if, and only if, they are among the alternative start codons for the considered genetic code and they appear at the first position for the considered sequence.
in_place – place translated sequences into the original Container instance (this discards original data). By default, return a new instance.

Returns:

By default, a new Container instance containing translated (protein) sequences. If in_place was True, return None.

translate_sequence(sequence, allow_alt=False)[source]¶

Translate a sequence.

Parameters:

sequence – a str, SequenceView or compatible instance containing codon sequences. Only upper-case strings.
frame – a ReadingFrame instance providing the exon positions in the correct frame. By default, a non-segmented frame covering all sequences is assumed (in case the provided alignment is the coding region and length must be a multiple of 3).
allow_alt – a boolean telling whether alternative start (initiation) codons should be considered. If False, codons are translated as a methionine (M) if, and only if, there are among the alternative start codons for the considered genetic code and they appear at the first position for the sequence.

Returns:

A new str instance containing translated (protein) sequences.

egglib.tools.orf_iter(sequence, code=1, min_length=1, forward_only=False, force_start=True, allow_alt=False, force_stop=True)[source]¶

Iterate over open reading frames. Return an iterator over non-segmented open reading frames (ORFs) found in a provided codon sequence in any of the six possible frames.

Parameters:

sequence – a string or a SequenceView representing a codon sequence.
code – genetic code identifier (see here). Required to be an integer among the valid values. The default value is the standard genetic code.
min_length – minimum length of returned ORFs. This value must be at least 1. It is understood as the length of the encoded peptide (since ORFs are returned as nucleotide sequences, they will have a length of least three times this value).
forward_only – consider only the three forward frames (do not consider the reverse strand).
force_start – if True, all returned ORFs are required to start with a start codon. Otherwise, all non-stop codons are included in ORFs. Note that alternative start codons (CTG and TTG for the standard genetic code) are also supported.
allow_alt – allow alternative start codons (only considered if force_start is set).
force_stop – require that ORFs end with a stop codon (this only excludes 3’-partial ORFs, that is ORFs that end with the end of the provided sequennce).

Returns:

An iterator over all detected ORFs. Each ORF is represented by a (start, stop, length, frame) tuple where start is the start position of the ORF and stop the stop position (such as sequence[start:stop] returns the ORF sequence or its reverse complement [note that the stop codon, if present, is included]), length is the ORF length (number of amino acids, excluding any stop codon) and frame is the reading frame on which it was found: +1, +2, +3 are the frames on the forward strand (starting respectively at the first, second, and third base), and -1, -2, -3 are the frames on the reverse strand (starting respectively at the last, last but one, and last but two bases).

egglib.tools.longest_orf(sequence, code=1, min_length=1, forward_only=False, force_start=True, allow_alt=False, force_stop=True)[source]¶

Detect the longest open reading frame. Arguments are identical to orf_iter(). A ValueError is raised if two or more open reading frames have the largest length.

Returns:: A (start, stop, length, frame) tuple (see the return value of orf_iter() for details), or None if no open reading frame fits the requirements.

egglib.tools.backalign(nucl, aln, code=1, ignore_names=False, ignore_mismatches=False, fix_stop=False, allow_alt=True)[source]¶

Align coding sequence based on the protein alignment.

Parameters:

nucl – a Container or Align instance containing coding sequences that should be aligned. There should not be any alignment gap in the sequences. The alphabet must be alphabets.codons.
aln – an Align instance containing an alignment of the protein sequences encoded by the coding sequences provided as nucl. Group labels of aln are not taken into account. The alphabet must be alphabets.protein.
code – genetic code identifier (see here). Required to be an integer among the valid values. The default value is the standard genetic code.
ignore_names – if True, ignore names for matching sequences in the protein alignment to coding sequences. Sequences will be matched using their rank and the names in the returned alignment will be taken from nucl.
ignore_mismatches – if True, do not generate any exception if a predicted protein does not match the provided protein sequence (if the lengths differ, an exception is always raised).
fix_stop – if True, support a single trailing stop codon in coding sequences not represented by a * in the provided protein alignment (if the final stop codons have been stripped during alignment). If found, this stop codon will be flushed as left as possible (immediately after the last non-gap character) in the returned coding alignment.

Returns:

An Align instance containing aligned coding codon sequences.

If a mismatch is detected between a protein from aln and the corresponding prediction from nucl, an instance of tools.BackalignError (a subclass of ValueError) is raised. Its attribute alignment can be used to help identify the reason of the error. Mismatches (but not differences of length) can be ignored with the option ignore_mismatches.

class egglib.tools.BackalignError(name, fnuc, faln, i_nuc, i_aln, ls_aln, ls_nuc, translator)[source]¶

Bases: ValueError

Exception used to report errors occurring during the use of tools.backalign() because of mismatches between the provided alignment and predicted proteins.

Attributes

`alignment`	Comparaison of provided and predicted proteins.
`name`	Name of sequence for which the error occurred.

Methods

add_note(object, /)

Exception.add_note(note) -- add a note to the exception

property alignment¶

Comparaison of provided and predicted proteins. This string shows the two proteins with a middle line composed of the following symbols:

|: match.
#: mismatch.
~: one protein shorter.

property name¶: Name of sequence for which the error occurred.

egglib.tools.trailing_stops(align, action=0, code=1, replacement='???')[source]¶

Process stop codons at the end of the sequences. This function detects and (optionally) fix trailing stop codons. The algorithm consists in locating the last non-gap codon of each sequence. If the last codon is partially gapped (which can happen if coding sequences are aligned using a DNA-based algorithm), it will not be detected and ignored.

Parameters:

align – an Align containing aligned coding sequences. The alphabet must be alphabets.codons.
action –
an integer specifying what should be done if a stop codon is found at the end of a given sequence. Possible actions are listed in the following table:

Code

Action

0

Nothing (just count them).

1

Replace them by a gap, and delete the last position if it is made by gaps only.

2

Replace them by the value given as replacement.

Note that using action=1 is not stricly equivalent to using action=2, replacement='---' because the former deletes the last three positions of the alignment if needed while the latter does not.
code – genetic code identifier (see here). Required to be an integer among the valid values. The default value is the standard genetic code.
replacement – if action is set to 2, provide the codon that should be used to replace stop codons.

Returns:

Code	Action
0	Nothing (just count them).
1	Replace them by a gap, and delete the last position if it is made by gaps only.
2	Replace them by the value given as replacement.

The number of sequences that had a trailing stop codons among the considered sequences.

egglib.tools.iter_stops(src, code=1)[source]¶

Iterate over stop codons. Return an iterator providing the coordinates of all stop codons over all sequences. Each iteration returns a (sample, position) where sample is the sample index and position is the index of the stop codon.

Parameters:

src – an Align or Container containing coding sequences. The alphabet must be alphabets.codons.
code – genetic code identifier (see here). Required to be an integer among the valid values. The default value is the standard genetic code.

Returns:

An iterator over the (sample, position) tuples corresponding to each stop codon found in the alignment (see above).

egglib.tools.has_stop(src, code=1)[source]¶

Test if there is at least one stop codon. Return True if the sequences contain at least one codon stop at any position in any sequence, and False otherwise.

Parameters:

src – an Align or Container containing aligned coding sequences.
code – genetic code identifier (see here). Required to be an integer among the valid values. The default value is the standard genetic code.

Returns:

A boolean.