Coalescent simulator¶
EggLib includes in this module a simulator based on the standard coalescent for diploid individuals with partial, and optional, self-fertilization. More details are given in a section of the manual.
The class coalesce.Simulator
manages all parameters and lets
the user run coalescence simulations. The other classes defined in this
module help managing parameters.
|
Manager of the coalescent simulator. |
|
|
|
|
|
|
|
Class storing the list of demographic events. |
- class egglib.coalesce.Simulator(num_pop, migr=0.0, **kwargs)[source]¶
Manager of the coalescent simulator. The constructor takes arguments controlling the demographic and mutation models used for simulation. Only the number of populations is required at the time of construction. Once it is set, it can be never modified. Other constructor arguments are all optional and can be set or modified later using the
params
instance attribute (either using itsupdate()
method or the[]
operator, such as insimulator.params['theta'] = 2.85
. Keyword arguments are passed as is toupdate()
. List-based parameters can be set if all values are provided in a sequence. Matrix-based parameters cannot be set here.- Parameters:
num_pop – number of populations.
migr – migration rate.
Other keyword arguments can be any of the parameters defined in the table below.
Parameter
Definition
Default
num_pop
Number of populations
None, required
num_sites
Number of sites
0 (ISM)
num_mut
Fixed number of mutations
0
theta
\(4N_0\mu\) parameter
0.0
recomb
\(4N_0c\) parameter
0.0
mut_model
Mutation model (among
KAM
,IAM
,SMM
andTPM
)KAM
TPM_proba
Probability parameter of TPM
0.5
TPM_param
Shape parameter of TPM
0.5
num_alleles
Number of alleles for KAM
2
rand_start
Pick start allele randomly for KAM (boolean)
False
num_chrom
Number of sampled chromosomes (per population)
0 for all
num_indiv
Number of sampled individuals (per population)
0 for all
N
Population size, expressed relatively to \(N_0\) (per population)
1.0 for all
G
Exponential growth/decline rate, negative values mean decline (per population)
0.0 for all
s
Population selfing probability (per population)
0.0 for all
site_pos
Site position, as values between 0.0 and 1.0 (per site)
Equally spread
site_weight
Site mutation weight, controlling the relative probability of sites (per site)
1.0 for all
migr_matrix
Pairwise migration rate matrix (the diagonal cannot be set)
0.0 for all
trans_matrix
Matrix of transition weights between pairs of alleles
1.0 for all
events
List of events added by the user
Empty
max_iter
Maximum number of iterations
100,000
The following table presents the categories of events that can be added to the
events
list usingadd()
.Event code
Description
Parameters
size
Change population size
T
– date (1)N
– new sizeidx
– population index (2)migr
Change all migration rate
T
– date (1)M
– migration rate (all pairwise migration rates are set toM/(num_pop-1)
)pair_migr
Change pairwise migration rate
T
– date (1)M
– pairwise migration ratesrc
– source population indexdst
– destination population indexgrowth
Change population exponential growth/decline rate
T
– date (1)G
– new rateidx
– population index (2)selfing
Change population self-fertilization rate
T
– date (1)s
– new rateidx
– population index (2)recombination
Change recombination rate
T
– date (1)R
– new ratebottleneck
Apply a bottleneck
T
– date (1)S
– bottleneck strength (3)idx
– population index (2)admixture
Move lineages from one population to another
T
– date (1)proba
– migration probability (in [0,1] rangesrc
– source population indexdst
– destination population indexmerge
Merge a population to another (take all lineages from
src
, move them todst
, and removesrc
T
– date (1)src
– source population indexdst
– destination population indexsample
Perform a delayed a some point in the past in one of the populations
T
– date (1)idx
– population indexlabel
– group label (4)num_chrom
– number of sampled chromosomesnum_indiv
– number of sampled individualsTime is expressed in units of \(4N_0\) generations.
If
idx
is omitted, the event is applied to all populations at once.Bottleneck strength is expressed in time units. A bottleneck is implemented as a period of time during which coalescences are the only event allowed to occur.
The label of delayed sample can be set to the same value than the populations index (in this case, delayed samples will have the same label than normal samples from the same population), or to a different label, as the user’s option.
The parameters
num_chrom
,num_indiv
,N
,G
,s
,site_pos
, andsite_weight
are represented byParamList
instances that behave like lists (except that their length cannot be changed). In particular,ParamList
support subscript indexing and it can also be initialized by passing a sequence. The parametersmigr_matrix
andtrans_matrix
are represented byParamMatrix
instances that support a double-index subscript system to read/changes values (as inparams['migr_matrix'][i,j]
to access the value at row i and column j. Diagonal values are read asNone
and cannot be changed. Finally,events
is represented by aEventList
instance that exhibit limited list functionality and provides methods to add, read, and modify events. See this class for more information about out editing the list of events.Attributes
An
Align
instance containing simulated data for the current iteration ofiter_simul()
.Simulation parameters of this instance.
Methods
iter_simul
(nrepet[, cs, dest_trees])Perform several simulations.
simul
([dest])Perform a single simulation.
- property align¶
An
Align
instance containing simulated data for the current iteration ofiter_simul()
. The data in this instance will be updated at each iteration round, and deleted at the end of the iteration. If it must be copied, a deep copy is required (typically withAlign.create()
).
- iter_simul(nrepet, cs=None, dest_trees=None, **feed_params)[source]¶
Perform several simulations. Simulations are conditioned on the current value of parameters stored in
params
. Return an iterator that will loop over the requested number of simulations. The simulated alignments are always available at each iteration loop as the instance attributealign
. By default, all iterations return a reference to this instance, but if cs is specified a dictionary of statistics is returned at each simulation.- Parameters:
nrepet – number of simulations. If
None
, iterate for ever (you are required to usebreak
in loop with your own stopping criterion is reach).cs –
ComputeStats
instance properly configured to compute statistics from simulated data. If cs is specified, each iteration round will yield the dictionary of statistics obtained fromprocess_align()
called on this object. See also cs_filter and cs_struct.dest_trees – a
list
in which simulated trees will be appended. Since each simulation can yield several trees (because of recombination), a new sub-list will be appended for each simulation, with one item for each tree. Each tree covered a defined region of the simulated chromosome, so each item of each sub-list will be(tree, start, stop)
tuple
withtree
as a newTree
instance, andstart
andstop
as the start and stop positions (real numbers comprised between 0 and 1). If recombination occurred, trees will not be sorted within their sub-list. If recombination did not occur, each simulation will be represented by a list with a singletuple
. Any previously data contained in dest_tree is left untouched.feed_params – other arguments provide sequences of values for any parameter (except
num_pop
), allowing to modify any set of parameters between simulations. All changes are permanent and affect any later simulation. All additional options must be in thekey=value
format, with have one of the parameters askey
, and a sequence of values asvalue
. All sequences must be of length at least equal to nrepet. If longer, additional values are ignored. For list-based parameters, each value must be a(index, value)
tuple and, for matrix-based parameters, each value must be a(index1, index2, value)
tuple.
- Returns:
An iterator, yielding a reference to
align
if cs isNone
, or otherwise a dictionary of computed statistics.
- property params¶
Simulation parameters of this instance. This object is a instance of
ParamDict
, which is a clone ofdict
that does not let the users add or remove parameters, but lets them modify values of parameters (similarly, the number of items of per-population or per-site parameters cannot be modified).
- simul(dest=None)[source]¶
Perform a single simulation. The simulation is conditioned on the current value of parameters stored in
params
.- Parameters:
dest – An
Align
to reset using simulated data. All previous data will be lost.- Returns:
A new
Align
instance containing the simulated data unless dest is specified (otherwiseNone
).
Note
The method
iter_simul()
can be much more efficient. For performance-critical applications, its use is recommended.
- class egglib.coalesce.ParamDict(npop)[source]¶
dict
-like class managing parameters. Do most of what a dictionary does except for adding and removing keys. The order of parameters is fixed and consistent.In addition to the methods documented below,
ParamDict
instances support the following operations (whereparams
in one instance):Expression
Action
len(params)
Number of parameters
params[key]
Get the value of parameter
key
params[key] = value
Assign
value
to parameterkey
(see note below)for key in params
Same as
for key in params.keys()
reversed(params)
Reversed iterator
key in params
Check if
key
in a parameter namestr(params)
Representation of the instance as a
dict
Note that the
params[key] = value
expression is straightforward only for parameters that have a single value. For parameters represented by aParamList
instance, this expression is only supported if the right-hand operand is a sequence of matching length (in order to set all values at once). For parameters represented by aParamMatrix
instance, this expression is only supported if the right-hand operand is anotherParamMatrix
instance of matching dimension. Forevents
, this expression is not supported at all. In all cases whereparams[key]
returnsParamList
,ParamMatrix
or aEventList
instance, the returned value can be modified using its own methods.Methods
add_event
(cat, T, **params)Add an event to the list.
copy
()Shallow copy.
Disable transition weight matrix.
get
(key[, default])Get a parameter value.
get_values
(other)Import parameter values.
has_key
(key)Tell if a parameter exists.
items
()Iterator over content of the instance.
keys
()Iterator over the parameter names.
mk_structure
([skip_indiv, outgroup_label])Create structure object.
set_migr
(value)Set migration rates.
summary
()Parameter summary.
update
([other])Import parameter values.
values
()Iterator over the parameter values.
- add_event(cat, T, **params)[source]¶
Add an event to the list. See the documentation for the class
Simulator
for more details on what is expected as arguments.- Parameters:
cat – category of the event.
T – date of the event.
params – all needed parameters for this category of events.
- copy()[source]¶
Shallow copy. Return a shallow copy of this instance. All modifications of the values of either copy will modify the other one (even modification of integer, float and string parameters).
- disable_trans_matrix()[source]¶
Disable transition weight matrix. Disable the matrix of weights of transition between alleles (it is disabled by default, and automatically activated i any value is set). This sets all weights to 1.0.
- get(key, default=None)[source]¶
Get a parameter value. Return the value for key if key is one of the parameters, else return the value passed as default (by default,
None
). This method therefore never raises aKeyError
.
- get_values(other)[source]¶
Import parameter values. Update the dictionary with values from another
ParamDict
instance (with the same numbers of populations and sites).
- has_key(key)[source]¶
Tell if a parameter exists. Return a boolean indicating if the passed name is one of the parameters.
- items()[source]¶
Iterator over content of the instance. Each iteration round yield a
(key, value)
parameter name and value pair.
- mk_structure(skip_indiv=False, outgroup_label=None)[source]¶
Create structure object. Export a
Structure
instance containing the structure information corresponding to currently loaded simulation parameters.- Parameters:
skip_indiv – this argument determines whether the individuals level should be skipped (if
True
, alleles from each given individual are treated as belonging to separate haploid individuals.outgroup_label –
this argument indicates the label of the outgroup population. It must be passed as a string. Default value is None.
Warning
The individual level cannot be processed if the ploidy is not constant (mixing sampled chromosomes and sampled individuals).
- Returns:
A new
Structure
instance.
- set_migr(value)[source]¶
Set migration rates. Set all pairwise migration rates to
value/(num_pop-1)
.
- summary()[source]¶
Parameter summary. Return a string displaying all current values of parameters at the level of the C++ simulator. Use for debugging only, as the format is not guaranteed to be stable.
- update(other=None, **kwargs)[source]¶
Import parameter values. Update the dictionary with the
(key, value)
parameter name and value pairs from the object passed as other, overwriting existing keys and raising aKeyError
exception if an unknown parameter name is met.Accepts
dict
instances, or else any iterable of(key, value)
pairs. If keyword arguments are specified, the instance is then updated with those, as inparam_dict.update(theta=5.0, recomb=2.5)
.
- class egglib.coalesce.ParamList(getter, setter, num_values, check_item)[source]¶
list
-like class managing values for a parameter. Do most of what a list does except for adding and removing values. No methodsreverse()
andsort()
are provided either as they make little sense. Expressions involving the+
and*
arithmetic operators are supported, but return genuinelist
instances that are disconnected from the original parameter holder.In addition to the methods documented below,
ParamList
instances support the following operations (whereitems
is aParamList
instance):Expression
Action
len(items)
Number of items
items[i]
Get ith item
items[i:j]
Slice from i to j
items[i:j:k]
Slice from i to j with step k
items[i] = value
Assign
value
to parameterkey
items[i:j] = values
Replace a slice of values (with a sequence of the same length)
items[i:j:k] = values
Replace a slice of values (with a sequence of the same length)
for item in items
Iterates over items
reversed(items)
Reversed iterator
item in items
Check if
item
is present among itemsitems + other
Same as
list(items) + other
(returns alist
)items * n
Same as
list(items) * n
(returns alist
)str(items)
Representation of the instance as a list
The indexing operator (
[]
) supports negative indexes (to count from the end) and slices operators, just as for the built-in typelist
. However, all operators (such asdel
) or methods (such asappend()
,extend()
,remove()
) that would change the length of the list are not available.Methods
count
(value)Number of items that are equal to value.
index
(value[, imin, imax])Index of the first value matching value.
- index(value, imin=0, imax=None)[source]¶
Index of the first value matching value. The returned value is
>=imin
and<imax
if they are provided. Raise aValueError
if the value is not found.
- class egglib.coalesce.ParamMatrix(getter, setter, num_values, check_item, check_active, set_active)[source]¶
list
-like class managing values for a parameter as a matrix. Similar toParamList
except that it holds a matrix. It therefore implements less methods (no support for+
and*
operators, no slices) and use double indexing system as inmatrix[i, j] = x
, allowing both setting and getting values. The iterator, such as infor i in matrix
, flattens the matrix, returning all values from the first row, then those from the second tow, and so on (replacing the diagonal values byNone
).len(matrix)
returns the dimension (number of rows, which is the same as the number of columns).In addition to the methods documented below,
ParamMatrix
instances support the following operations (wherematrix
is aParamMatrix
instance):Expression
Action
len(matrix)
Size of the matrix (as number of rows/colums)
matrix[i,j]
Get jth item of the ith row (
None
for diagonal)matrix[i,j] = value
Assign
value
to (i,j)th item (no diagonal)for item in matrix
Iterates over items (row-first flat iteration)
reversed(matrix)
Reversed iterator
item in matrix
Check if
item
is present among itemsstr(matrix)
Representation of the instance as a nested list (including diagonal)
It is not possible to slice-set ranges of values in the matrix, but one can set all values at once from a compatible source with
get_values()
.Methods
count
(value)Number of items that are equal to value.
get_values
(other)Import parameter values.
index
(value[, imin, imax, jmin, jmax])Get the index of a given value.
- get_values(other)[source]¶
Import parameter values. Get all values from other, which must be another
ParamMatrix
or a nested sequence (such as alist
orlist
). The dimension of other must be the same as the current instance. Not that in the input object is not aParamMatrix
, any value it can have on the diagonal will be ignored (but they must be present to avoid shifting indexes).
- index(value, imin=0, imax=None, jmin=0, jmax=None)[source]¶
Get the index of a given value. Return the
(i, j)
tuple of indexes of the first value (iterating first over rows and then over columns) matching value. The returned row index is >=imin and <imax and the returns column index is >=jmin and <jmax if they are provided. Raise aValueError
if the value is not found. The diagonal is not considered.
- class egglib.coalesce.EventList(npop, add, clear)[source]¶
Class storing the list of demographic events. Even if events appear as unsorted, they are internally sorted so that the user does not have to care about their order. This class has limited functionality: add events (but not removing them), accessing and modifying parameters.
In addition to the methods documented below,
EventList
instances support the following operations (whereevents
is anEventList
instance):Expression
Action
len(events)
Number of events currently loaded
events[i]
Dctionary with parameters of the ith event (see note)
for event in events
Same as
for i in range(len(events)): event = events[i]
str(events)`
Representation of the instance, roughly as a list
The string representation is such as a string at the first level, but each event is represented as an angle-bracketed delimited string containing comma-separated
key=value
pairs with the parameter askey
and its value asvalue
(with an additional key,event_index
, providing the index of the event in the list).Note
There is no way to modify content of the instance using the indexing operator
events[i]
. One must useupdate()
.More information on the list of events and their parameters is available in the documentation for class
Simulator
.Methods
add
(cat, T, **params)Add an event to the list.
clear
()Clear list of events.
replace
(other)Replace list of events.
update
(event_index, **params)Modify any parameter from one of the event of the list.
- add(cat, T, **params)[source]¶
Add an event to the list. See the documentation for the class
Simulator
for more details on what is expected as arguments.- Parameters:
cat – category of the event.
T – date of the event.
params – all needed parameters for this category of events.
- replace(other)[source]¶
Replace list of events. Replace own list of events with the one in the
EventList
instance passed as other. The current list of events is dropped.
- update(event_index, **params)[source]¶
Modify any parameter from one of the event of the list. If an event’s date is modified, sorting will be updated automatically.
- Parameters:
event_index – index of the event to modify (based on the order in which events have been specified with
add()
, which is the same order as events appear when representing the instance or iterating).params – keyword arguments specifying what parameters to modify. Only parameters that have to be changed should be specified.