EggLib
EggLib version 2.1 is archived.

Table Of Contents

Previous topic

simul

Next topic

Utils module

This Page

fitmodel

Approximate Bayesian Computation components. Part of the underlying C++ utilities are (currently) without Python wrapper, so should be used directly throught the binding. Please refer to the C++ library documentation for the following class:

  • ABC: the class to perform the rejection-regression operations.

Utilities

class egglib.fitmodel.Dataset

Bases: object

Manages a set of read or simulated alignments. Supports len(). Note that, on observed data and when there are several populations and/or outgroups , sort_aligns() should be called to handle the case where the different populations are mixed in alignments,

add(align)

Add an Align instance (which will be copied by reference).

config()

Returns a list with - for each alignment - a tuple containing 4 items: total sample size, list of sample size per population (excluding outgroups), number of outgroups, alignment length. The alignment length is excluding not usable sites (corresponding to lseff). Populations are sorted by their label.

iterator(config=None)

This iterator zips the object passed as config the alignments stored in the instance. The user should ensure that the object passed as config is an iterable and has the same length as the current instance (it will not be done automatically). Each iteration round returns a (align, configItem) tuple, where configItem an item of config. If config is None, the result of config() will be used.

pops()

Returns the set of distinct populations in the data set (note that all populations are not required to be represented in each locus). The outgroup is not considered.

sort_aligns()

Sort each alignment such as they match the exported config (all populations appear grouped and in the increasing order, with outgroups at the end).

class egglib.fitmodel.ParamSample(length)

Bases: object

Holds a list of float values (parameters) of fixed length. Supports len(), use of the subscript ([]) operator for accessing or modifying values (but not deleting) and str() (str(paramSample) or with print()). String formatting returns a space-delimited string of the values.

The constructor expects a length argument to fix the number of parameters.

values()

Gets a deep copy of the values contained in the instance

egglib.fitmodel.import_posterior(fname)

Imports a posterior file

fname must be the name of a file containing fitted ABC data, one sample per line and one parameter per column. Header line is optional and is automatically detected. If present, no parameter name can be provided as a number.

Returns a tuple (params, data) where params is the list of parameter names (automatic if header is not present) and data is a list of lists, one list per parameter (note that the returned list is transposed with respect to the input file). Beware that headers with number-only parameters will be mistaken with values.

Prior implementations

class egglib.fitmodel.PriorParseError

Raised by Prior parse() methods when the format is found to be incorrect. It can be caught to auto-detect prior types.

class egglib.fitmodel.PriorDiscrete(random=None)

Bases: object

This prior is based on discrete categories. It consists in a set of weighted categories with free boundaries. Within a category, the probability density is uniform. It allows using uniform distribution with fixed bounds, discretized empirical distribution and theoretical laws of distribution as priors.

PriorDiscrete instances have a length (the number of categories) and are iterable. Each iteration yields a (p, bounds) tuple with p the frequency of a class and bounds a list giving the bound values (themselves as a tuple) for all parameters.

Constructor argument random must be a Random instance and will be used to generate pseudorandom numbers.

add(freq, *bounds)

Adds a category to the distribution. freq gives the frequency of the category. The frequencies don’t need to be relative. bounds must be separate 2-item lists or tuples giving the lower and upper bound values for each parameter.

clear()

Clears the instance.

draw()

Generates a set of random values for all parameters.

force_positive()

Enforces that drawn parameter values are >=0 (values <0 will be ignored). This flag is not cancelled if clear() is called.

New in version 2.0.2.

number_of_params()

Returns the number of parameters, (0 if no data loaded).

parse(string)

Imports data from the string string. The data format is: one line per category (in any order), each line following the format freq down1;up1 down2;up2 ... where freq is the frequency of the category (needs not to be relative), down the lower bound value and up the upper bound value for a given parameter. The function raises a PriorParseError in case of format error.

str()

Formats the content of the instance as a string, in a format appropriate for passing to parse().

class egglib.fitmodel.PriorDumb(random=None)

Bases: object

This prior doesn’t allow covariation between parameters or discrete categories. The probability distribution for each parameter is specified has a uniform or continuous statistical distribution. The list below presents the available distribution types, with the one-letter code and the list of expected parameters, expected by the method add():

  • U: uniform probability between down and up.
  • E: exponential distribution of mean mean.
  • P: Poisson distribution of parameter p.
  • G: gamma distribution of parameter p.
  • N: normal distribution of mean m and standard deviation s.
  • F: parameter fixed to the value v.

Constructor argument random must be a Random instance and will be used to generate pseudorandom numbers.

add(type, *parameters)

Adds a parameter to the distribution. type a one-letter code identifying the type of statistical distribution and parameters are the distribution’s parameters, given in the appropriate order.

clear()

Resets the instance.

draw()

Draws a ParamSample from the instance.

force_positive()

Enforces that drawn parameter values are >=0 (values <0 will be ignored). This flag is not cancelled if clear() is called.

New in version 2.0.2.

number_of_params()

Returns the number of parameters.

parse(string)

Imports data from the string string. The data format is: one token per parameter. The token can be arranged as one per line or separated by any white space characters. Each token must follow the format X(...) where X is the one-letter code specifying the type of distribution (F, U, N, G, P or E) and ... represents the needed parameter values in the appropriate order, separated by commas or semi-colons. The brackets can be replaced by square brackets and the model specification is case-independent. The function raises a PriorParseError in case of invalid format.

str()

Generates a string representation of the instance.

egglib.fitmodel.priors = [<class 'egglib.fitmodel.PriorDumb'>, <class 'egglib.fitmodel.PriorDiscrete'>, <class 'egglib.fitmodel.PriorParser'>]

This list contains the class objects (different from class instances, they are the classes themselves) corresponding to priors. They must define parse(), draw() and str() methods, and class-level string name and an informative docstring but this is not (yet) enforced. This list is designed to help interactive commands to detect automatically available priors.

Demographic model implementations

class egglib.fitmodel.SNM(recombination)

Bases: object

Standard Neutral Model: constant-sized single population. Allows optional recombination. Parameters: THETA, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.PEM(recombination)

Bases: object

Population Expansion Model (exponential growth), with optional recombination. Parameters: THETA, ALPHA, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.BNM(recombination)

Bases: object

Bottleneck Model, with optional recombination. Parameters:

  • THETA
  • DATE (date of the end of the bottleneck)
  • DUR (bottleneck duration)
  • BOTZISE (size of the population during the bottleneck)
  • ANCSIZE (size of the ancestral population)
  • RHO (optional)

Note that if botsize is >1, the model can be generalized to a double instant change model.

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.GDB(recombination)

Bases: object

Composite-parameter bottleneck, after the formalization of Galtier, Depaulis and Barton bottleneck model, with optional recombination. The bottleneck is implemented as a number of coalescent events occurring precisely at the time given by the DATE parameter. The STRENGTH is expressed as an amount of time of the normal coalescent process during which only coalescent occur (no migraton, not mutation) and during which the global time counter doesn’t change. Ref: Galtier et al. Genetics 155:981-987, 2000.

Parameters: THETA, DATE, STRENGTH, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented .

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.GGDB(recombination)

Bases: object

Generalized Galtier, Depaulis and Barton with optional recombination. See GDB model. ANCSIZE gives the ancestral population size. Parameters: THETA, DATE, STRENGTH, ANCSIZE, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented,.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.IM(recombination)

Bases: object

Island Model, with optional recombination. The number of populations is automatically detected from the observed dataset. Parameters: THETA, MIGR, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.IMn(recombination)

Bases: object

Island Model with different population sizes, with optional recombination. The size of the first population is fixed to 1, therefore the size of all populations with index >1 must be specified as parameter. Parameters: THETA, MIGR, population sizes, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented .

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.IMG(recombination)

Bases: object

Island Model with exponential Growth, with optional recombination. Parameters: THETA, MIGR, ALPHA,

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.IMiG(recombination)

Bases: object

Island Model with Independent exponential Growth in each population, with optional recombination. The growth rate of each population must be provided. Parameters: THETA, MIGR, ALPHA for all populations, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.IMiGn(recombination)

Bases: object

Island Model with Independent exponential Growth in each population, different population sizes and with optional recombination (the size of the first population is fixed to 1). The growth rate of each population must be provided, and the size of all populations save for the first one as well. Parameters: THETA, MIGR, growth rates, population sizes, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.SM(recombination)

Bases: object

Split Model (thinking forward), with optional recombination. The DATE parameter sets the split date and MIGR the migration rate after the split. Parameters: THETA, MIGR, DATE, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.AM(npop, recombination)

Bases: object

Admixture Model, with optional recombination. The DATE argument sets the time when ancestral populations joined and MIGR the migration rate that occurred between these populations. Note that the migration rate must not be 0 because coalescent time might be infinite. Present-day samples are not structured. Parameters: THETA, DATE, MIGR, RHO (optional). In abc_sample, specify this model as AM:k where k is the number of ancestral populations.

The constructor expects a boolean to indicate whether recombination must be implemented, and an integer giving the number of ancestral populations.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.MRC(recombination)

Bases: object

Migration Rate Change, with optional recombination. MIGR0 is the current migration rate and MIGR1 the ancestral migration rate. Parameters, THETA, DATE, MIGR0, MIGR1, RHO (optional).

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

class egglib.fitmodel.DOM(recombination)

Bases: object

Domestication model, with optional recombination. Parameters:

  • THETA
  • SIZE (size of the cultivated population)
  • DATE (date of the bottleneck)
  • DUR (duration of the bottleneck)
  • STRENGTH (size of the bottleneck population)
  • MIGR (bidirectional migration rate)
  • RHO (optional)

The size of the wild population is 1. The domestication date is DATE+DUR.

The constructor expects a boolean to indicate whether recombination must be implemented.

generate(cfg, ps, random)

Generates a simulated dataset based on the passed sample configuration and the parameter sample.

egglib.fitmodel.models = [<class 'egglib.fitmodel.SNM'>, <class 'egglib.fitmodel.PEM'>, <class 'egglib.fitmodel.BNM'>, <class 'egglib.fitmodel.GDB'>, <class 'egglib.fitmodel.GGDB'>, <class 'egglib.fitmodel.IM'>, <class 'egglib.fitmodel.IMn'>, <class 'egglib.fitmodel.IMG'>, <class 'egglib.fitmodel.IMiG'>, <class 'egglib.fitmodel.IMiGn'>, <class 'egglib.fitmodel.MRC'>, <class 'egglib.fitmodel.AM'>, <class 'egglib.fitmodel.SM'>, <class 'egglib.fitmodel.DOM'>]

This list contains the class objects (different from class instances, they are the classes themselves) corresponding to demographic models. They must define a generate() method taking a configuration list and a param sample, their constructor must take 0 or more integer arguments and then a boolean indicating whether recombination occurs. They must define a class-level string name and a class-level list of strings parameters and an informative docstring. All this is not (yet) enforced. This list is designed to help interactive commands to detect automatically available models.

egglib.fitmodel.add_model(name)

Adds a name model contained in the file name.py. The model will be accessible in the fitmodel.models list.

Summary statistics implementations

class egglib.fitmodel.TPH

Bases: object

Computes the following statistics: thetaW, Pi, He (averaged over all loci).

class egglib.fitmodel.TPS

Bases: object

Computes the following statistics: total thetaW, Pi for each populationm and Hudson’s Snn (nearest neighbor statistic). The number of statistics will be 2 + the number of populations. Statistics are averaged over all loci.

class egglib.fitmodel.SFS(number)

Bases: object

Compute the site frequency spectrum. The statistics are the average thetaW over all loci, and then the relative frequency of a user-defined number of bins of allele minor frequencies. For example, if the number of bins if 4, the 5 statistics will be: average thetaW, and then proportion of all polymorphic sites from all loci with minor allele <=0.125, >0.125 and <=0.25, >0.25 and <=0.375, and >0.375 and <=0.5. Expected argument: number of categories in the spectrum.

class egglib.fitmodel.JFS(number)

Bases: object

Compute the joint frequency spectrum. This set of summary statistics requires two populations The first two statistics are the average thetaW over all loci in both populations, and then the relative frequency of a user-defined number of bins of the frequencies of the minor allele in both populations. If the number of bins if 4, there will be 2+4**4 = 18 statistics: average thetaW in the first populations, in the second populations, and then the proportion of mutations with the minor allele at frequency <=0.125 in both populations, and then at frequency <=0.125 in the first population but at frequency >0.125 and <=0.25 in the second population, and so on. Expected argument: number of categories in one dimension of the joint spectrum.

There are some restrictions when using this summary statistics set: there must be exactly two populations; sequences for the first population must be consecutive; there must be exactly two alleles at each site and there cannot be any missing data.

egglib.fitmodel.summstats = [<class 'egglib.fitmodel.SDZ'>, <class 'egglib.fitmodel.TPH'>, <class 'egglib.fitmodel.TPS'>, <class 'egglib.fitmodel.TPF'>, <class 'egglib.fitmodel.TPK'>, <class 'egglib.fitmodel.SFS'>, <class 'egglib.fitmodel.JFS'>, <class 'egglib.fitmodel.DIV'>]

This list contains the class objects (different from class instances, they are the classes themselves) corresponding to sets of summary statistics. They must define a compute() method taking a dataset and a sample configuration. This method must create a stats member containing a defined number of number (statistics). The constructor might (or might not) take integer arguments. A name() class member and an informative docstring are also required. All this is still not (yet) enforced. This list is designed to help interactive commands to detect automatically available sets of summary statistics.

Hosted by  Get seqlib at SourceForge.net. Fast, secure and Free Open Source software downloads