prosstt.count_model module

This module contains all functions that pertain to the model that generates UMI count data for the simulated cells.

prosstt.count_model.generate_negbin_params(tree, mean_alpha=0.2, mean_beta=2, a_scale=1.5, b_scale=1.5)

Generate default hyperparameters for the negative binomial distributions that are used to simulate UMI count data.

For motivation of the choice of distribution and default values, please refer to the paper.

Parameters:
  • tree (Tree) – A lineage tree
  • mean_alpha (float, optional) – The average alpha value
  • mean_beta (float, optional) – The average beta value
  • a_scale (float, optional) – The standard deviation for alpha
  • b_scale (float, optional) – The standard deviation for beta
Returns:

  • alphas (ndarray) – Alpha values for each gene
  • betas (ndarray) – Beta values for each gene

prosstt.count_model.get_pr_amp(mu_amp, s2_amp, ksi)

Calculate parameters the negative binomial that describes the distribution of the original transcripts before amplification. We make the assumption that the amplification has no sequence bias and is the same for all transcripts.

Parameters:
  • mu_amp (float) – Mean expression of the amplification.
  • s2_amp (float) – Variance of the amplification.
  • ksi (int) – Number of initial transcripts present.
Returns:

  • p_amp (float) – The probability of success of the Bernoulli test.
  • r_amp (float) – The number of “failures” of the Bernoulli test.

prosstt.count_model.get_pr_umi(a, b, m)

Calculate parameters for my_negbin from the mean and variance of the distribution.

For single cell RNA sequencing data we assume that the distribution of the transcripts is described by a negative binomial where the variance s^2 depends on the mean mu by a relation s^2 = a*mu^2 + b*mu.

Parameters:
  • a (float) – Coefficient for the quardratic term. Dominates for high mean expression.
  • b (float) – Coefficient for the linear term. Dominates for low mean expression.
  • m (float) – Mean expression of a gene.
Returns:

  • p (float) – The probability of success of the Bernoulli test.
  • r (float) – The number of “failures” of the Bernoulli test.

prosstt.count_model.get_pr_umi_atom(a, b, m)

Calculate parameters for my_negbin from the mean and variance of the distribution.

For single cell RNA sequencing data we assume that the distribution of the transcripts is described by a negative binomial where the variance s^2 depends on the mean mu by a relation s^2 = a*mu^2 + b*mu.

Parameters:
  • a (float) – Coefficient for the quardratic term. Dominates for high mean expression.
  • b (float) – Coefficient for the linear term. Dominates for low mean expression.
  • m (float) – Mean expression of a gene.
Returns:

  • p (float) – The probability of success of the Bernoulli test.
  • r (float) – The number of “failures” of the Bernoulli test.

prosstt.count_model.lognegbin(x, theta)

Alternative formulation of the log negative binomial distribution pmf. scipy does not support the extended definition so we have to supply it ourselves.

Parameters:
  • x (int) – The random variable.
  • theta (real array [p, r]) – p is the probability of success of the Bernoulli test and r the number of “failures”.
Returns:

Return type:

Probability that a discrete random variable is exactly equal to some value.

class prosstt.count_model.my_negbin(a=0, b=inf, name=None, badvalue=None, moment_tol=1e-08, values=None, inc=1, longname=None, shapes=None, extradoc=None, seed=None)

Bases: scipy.stats._distn_infrastructure.rv_discrete

Class definition for the alternative negative binomial pmf so that we can sample it using rvs().

prosstt.count_model.negbin(x, theta)

Alternative formulation of the negative binomial distribution pmf. scipy does not support the extended definition so we have to supply it ourselves.

Parameters:
  • x (int) – The random variable.
  • theta (real array [p, r]) – p is the probability of success of the Bernoulli test and r the number of “failures”.
Returns:

Return type:

Probability that a discrete random variable is exactly equal to some value.

class prosstt.count_model.sum_negbin(a=0, b=inf, name=None, badvalue=None, moment_tol=1e-08, values=None, inc=1, longname=None, shapes=None, extradoc=None, seed=None)

Bases: scipy.stats._distn_infrastructure.rv_discrete

Class definition for the convoluted negative binomial pmf that describes non-UMI data.