Use of historical events¶

In this section we provides some pointers on defining demographic models involving past events using coalesce.

Bottlenecks¶

EggLib supports changing the size of populations at any point in the history of the sample. Thus, a bottleneck can straightforwardly implemented by two consecutive population sizes (first reduction, and then restoration of the same size). Assume the following model:

The implementation would be:

>>> c = egglib.coalesce.Simulator(num_pop=1, num_chrom=[50], theta=5.0)
>>> c.params.add_event(cat='size', T=0.20, N=0.1)
>>> c.params.add_event(cat='size', T=0.21, N=1.0)

This approach has the advantage of being easier to interpret, and allowing independent population size before and after the bottleneck.

A second way to implement bottlenecks is the bottleneck event, which has the advantage of saving one parameter. Instead of requiring both duration and strength parameters, it requires only a strength parameter. The code is of the type:

>>> c.params.add_event(cat='bottleneck', T=0.2, S=0.5)

This model can be represented by:

In the standard bottleneck (first model), the main consequence of the bottleneck is an increase rate of coalescence events. This second implementation assumes that the bottleneck is short enough so that its duration can be neglected. Instead, a random number of coalescence events are instantaneously performed at the time of the bottleneck. The number of coalescence events is proportional to the S parameter and is equivalent to what would be if the population would have been allowed to evolved during a time given by S (as represented by the insertion of a period of time equal to S in the picture above).

Merge, admixture, and split events¶

The principle of fixing once and for all the number of populations does not impose any limitation in the implementation of complex models, which can all be implemented by available historical events. In this regard, the admixture event is essential.

Its definition is: at time T, take a random number of lineages from the population src (with probability proba), and move them instantly to the population dst. See the example below along with its implementation:

>>> c = egglib.coalesce.Simulator(num_pop=2)
>>> c.params.add_event(cat='admixture', T=0.5, src=0, dst=1, proba=0.5)

The parameter proba=5 means that every lineage present at that point in the first population will have a probability 0.5 to move to population 1. Obviously, we would have to add samples to make a working example. In coalescent models, admixture means that lineages move (in this case) from population src to population dst, but it is important to realise that, biologically speaking, this means that population dst has sent a bunch of migrants to population src (everything is reverted since the model works backward in time).

The population split event does not exist on its own, but can readily implemented as an admixture. Again, remember that the term split must be understood backward in time. To implement it, add one population with all migration rates to it set to 0 (which is the default anyway), and without samples. This is a “ghost” population with no effect on your sample since no lineage can move to it. Program an admixture event to occur at the time of the split, sending a given proportion of lineages to this ghost population which, at this point, ceases to be ghost. Of course, the size of all populations can be set freely at the start of the population, and changed at any time by the historical event size.

>>> c = egglib.coalesce.Simulator(num_pop=2, num_chrom=[50, 0])
>>> c.params.add_event(cat='admixture', T=0.5, src=0, dst=1, proba=0.5)

The event merge is actually available, here we describe how it is implemented in order to demonstrate the power of event management in the coalesce module.

To perform a population merge, it is possible to define an admixture where all samples from one of the populations are taken to the other one (proba=1). To make it a proper merge event, the migration rate to the merged (donor) population must be cancelled. So, if the migration rates have not been set to a non-zero, value, the admixture event is enough to implement a merge:

>>> c = egglib.coalesce.Simulator(num_pop=2, num_chrom=[20, 20])
>>> c.params.add_event(cat='admixture', T=0.5, src=1, dst=0, proba=1)

But if the migration rate is non-zero it is also necessary to set all migration rates to the donor population to 0.

As said above, there is a merge event that perform all required operations:

>>> c = egglib.coalesce.Simulator(num_pop=2, num_chrom=[20, 20], migr=1)
>>> c.params.add_event(cat='merge', T=0.5, src=1, dst=0)

Ghost populations and unsampled populations¶

When considering complex models, it can be useful to make a clear distinction between regular populations, ghost populations, and unsampled populations.

Regular populations are populations where samples have been placed.
Ghost populations are populations where there are no samples, and to which all migration rates from any non-ghost populations are null. The essential fact is that no lineages can ever go to this type of population so they don’t have any effect on the final coalescent.
Unsampled populations are populations where samples have not been placed, but where lineages can move by migration.

In the below example, population 0 is regular, population 1 is unsampled, and population 2 is a ghost:

>>> c = egglib.coalesce.Simulator(num_pop=3, num_chrom=[20, 0, 0])
>>> c.params['migr_matrix'] = [[None, 1, 0],
...                            [1, None, 0],
...                            [0, 0, None]]

Regular and unsampled populations are actually the same because the important fact is that they are allowed to contain lineage at any time.

Completing simulations¶

An important point while designing a model is to prevent any situation where the coalescent process will not be able to complete. Such situation occurs if a population is experiencing exponential decline (as mentioned when describing the parameter G) or in any case when samples are allowed to end up trapped in different populations with no chance to ever coalesce. In either case, the simulator will not be able to complete the tree and a RuntimeError will occur. The example below shows the most simple case where this error is bound to occur:

>>> c = egglib.coalesce.Simulator(num_pop=2, num_chrom=[1, 1])
>>> c.simul()
[traceback omitted]
RuntimeError: infinite coalescent time (unconnected populations or excessive ancestral population size)

It is important to note that the problem might not be systematic. For example, in the example below, there is a degree of freedom that makes that the error will occur randomly:

>>> c = egglib.coalesce.Simulator(num_pop=3, num_chrom=[20, 0, 0])
>>> c.params['migr_matrix'] = [[None, 1, 0],
                               [1, None, 0],
                               [0, 0, None]]
>>> c.params.add_event(cat='admixture', T=0.3, src=1, dst=2, proba=0.2)
>>> c.params.add_event(cat='merge', T=0.5, src=0, dst=1)
>>> for x in c.iter_simul(1000):
...     print('done')
...
done
done
done
done
[traceback omitted]
RuntimeError: infinite coalescent time (unconnected populations or excessive ancestral population size)

Here the error occurs at the fifth repetition (but this is random). It is caused as soon as two different lineages end up trapped in, respectively, the two last populations. Since the migration rate between them is null and there is no additional historical even planned, they can never coalesce (but this is not required to happen each time).

To prevent this error, it is required to allow the complete coalescence of the sample at some point of the past, either by allowing migration (setting non-null migration rates) or merging populations (and also by preventing any population to increase in size indefinitely due to the G option).

Occasionally, you might meet another error message:

>>> c = egglib.coalesce.Simulator(num_pop=3, num_chrom=[20, 0, 0])
>>> c.params['migr_matrix'] = [[None, 1, 0],
...                            [1, None, 0],
...                            [0, 0, None]]
>>> c.params.add_event(cat='admixture', T=0.5, src=1, dst=2, proba=0.2)
>>> for x in c.iter_simul(1000):
...     print('done')
...
done
done
done
done
done
done
done
done
done
done
done
done
[traceback omitted]
RuntimeError: failed to complete coalescent tree: two lineages might be trapped to unconnected populations (if you are sure your model is correct, increase the parameter `max_iter`)

In this case, two lineages are again trapped because the last population is not connected with the others, but the one lineage which is not in the last population can still migrate between the first two. As a result, there is still always an event to be applied by the coalescent algorithm, but this can never complete the coalescent. To prevent this, a bound is included, stopping the algorithm after a large number of iterations.

It is possible to increase this bound in case of models taking very long (that is, a very large number of iteration steps) but still ensuring completion of the coalescent (see Maximum number of iterations).

Delayed samples¶

EggLib’s coalescent simulator supports historical (delayed) sample, that is samples that have been collected at some known point in the past and would be analysed along with contemporary samples. The sampling date must be known and expressed in units of \(4N_0\) generations as all other dates. Delayed samples must be taken from one of the populations declared at the instance construction, whether or not this population has an initial sample. It is thus possible to simulate the temporal evolution of a population through consecutive samplings:

>>> c = egglib.coalesce.Simulator(num_pop=1, num_chrom=[20])
>>> c.params.add_event(cat='sample', T=0.5, idx=0, label='1', num_chrom=20, num_indiv=0)

It is also possible to sample a previously unsampled population:

>>> c = egglib.coalesce.Simulator(num_pop=2, num_chrom=[20, 0])
>>> c.params.add_event(cat='sample', T=0.2, idx=1, label='1', num_chrom=20, num_indiv=0)
>>> c.params.add_event(cat='merge', T=1.0, src=1, dst=0)

Note that the label option can help discriminate the different sampling in the downstream analyses. Like initial samples, it is possible to specify both haploid (individuals for which only one chromosome has been sampled) and diploid (individuals with both chromosomes sampled) samples.