prosstt.sim_utils module

This module contains utility functions for the simulations, such as a printed progress bar, functions to perform quality checks, functions to create and assign groups, and functions to pick between alternatives when multiple options are possible.

prosstt.sim_utils.adjust_to_parent(relative_means, current, topology)

Adds a vector to the relative means of the current branch such that its first row is equal to the last row of the relative means of the branch that precedes it.

Parameters:
  • relative_means (Series) – Relative mean expression for all genes on every (currently available) lineage tree branch.
  • current (int) – The current branch to be adjusted.
  • topology (numpy.ndarray) – The tree topology in array format.
Returns:

res – Adjusted relative expression matrix for the current branch.

Return type:

numpy.ndarray

prosstt.sim_utils.assign_branches(branch_times, timezone)

Assigns a branch to every timezone:

        -- T[1]------
-T[0]--|          -- T[3]------
        -- T[2]--|
                  -- T[4]-
timezones:
---0---|----1----|-2-|-3--|-4--
timezone branch
0 0
1 1,2
2 1,3,4
3 3,4
4 3

A time point in timezone i can belong to one of k possible branches

Parameters:
  • branch_times (list of int lists) – The pseudotime at which branches start and end.
  • timezone (int array) – Array that contains the timezone information for each pseudotime point.
Returns:

res – A list of the possible branches for each timezone.

Return type:

list of int lists

prosstt.sim_utils.belongs_to(timezone, branch)

Checks whether a timezone start and end are contained within the pseudotime of a branch.

Timezones are constructed such that they don’t go over branch boundaries. This method is used to determine which branches are possible for a timezone.

Parameters:
  • timezone (int array) – The pseudotime at which the timezone starts and ends.
  • branch (int array) – The pseudotime at which the branch starts and ends.
Returns:

Whether the timezone is contained within the branch.

Return type:

bool

prosstt.sim_utils.bfs_finder(graph, start)

Perform a breadth-first search where the graph is a list of connections.

Parameters:
  • graph (numpy.ndarray) – A numpy array of shape (N, 2). Every row [a, b] describes a connection from branch a to branch b.
  • start (int) – The root node from which to start the traversal.
Returns:

  • output (numpy.ndarray) – The input graph sorted by breadth-first traversal order.
  • Originally answered by StackOverflow user https (//stackoverflow.com/users/2988730)
  • for question https (//stackoverflow.com/questions/50589804.)

prosstt.sim_utils.bifurc_adjust(child, parent)

Adjust two matrices so that the last line of one equals the first of the other.

Parameters:
  • child (matrix to be adjusted) –
  • parent (matrix to adjust to) –
prosstt.sim_utils.breadth_first_branches(tree)

Performs a breadth-first traversal of the tree topology.

Parameters:tree (Tree) – A lineage tree object.
Returns:bfs – The tree branches in the order of traversal (breadth-first).
Return type:list
prosstt.sim_utils.calc_relat_means(tree, programs, coefficients)

Calculate relative mean expression for a lineage tree given the expression programs and the coefficient matrix that contains the contribution of each expression program to each gene.

Parameters:
  • tree (Tree) – A lineage tree object.
  • programs (Series) – Relative expression for all expression programs on every branch of the lineage tree.
  • coefficients (numpy.ndarray) – Array that contains the contribution weight of each expr. program for each gene
prosstt.sim_utils.calc_scalings(cells, scale=True, scale_v=0.7)

Obtain library size factors for each cell.

Parameters:
  • cells (int) – The number of cells
  • scale (bool, optional) – Whether to simulate different scaling factors for each cell
  • scale_v (float, optional) – The standard deviation of the library size distribution (log-normal distribution around 0)
Returns:

scalings – A library size factor for each cell

Return type:

numpy.ndarray

prosstt.sim_utils.create_groups(no_programs, no_genes)

Returns a list of the groups to which each gene belongs.

Each gene g is assigned one of no_programs possible groups twice (random draw with replacement).

Parameters:
  • K (int) – Number of modules.
  • G (int) – Number of genes.
Returns:

groups – A list of the two modules to which each gene belongs.

Return type:

list of ints

prosstt.sim_utils.diverging_parallel(branches, programs, genes, tol=0.5)

Calculate if the expression programs in all pairs of parallel branches are diverging enough to make the branches distinguishable from each other.

Parameters:
  • branches (list) – A list of pairs of parallel branches.
  • programs (Series) – Relative expression for all expression programs on every branch of the lineage tree.
  • genes (int) – The number of genes included in the lineage tree.
  • tol (float, optional) – The percentage of genes that must have anticorrelated expression patterns over pseudotime in order for the branches to be considered diverging.
Returns:

diverging – A list of the boolean values: whether each pair of parallel branches diverges or not.

Return type:

numpy.ndarray

prosstt.sim_utils.find_parallel(tree, programs, branch)

Find all branches that are parallel to the input branch (have same parent branch).

Parameters:
  • tree (Tree) – A lineage tree object.
  • programs (Series) – Relative expression for all expression programs on every branch of the lineage tree.
  • branch (int) – The branch to examine.
Returns:

A list of branches that are parallel to the input branch, including the branch itself.

Return type:

list

prosstt.sim_utils.flat_order(n)

Map from indices of flat array of size n(n-1)/2 to an upper triangular matrix of size nxn

Parameters:n (int) – number of options to combine
prosstt.sim_utils.max_relat_exp(tree, relative_means)

Finds maximum relative gene expression for each gene along the lineage tree.

Parameters:
  • tree (Tree) – A lineage tree object.
  • relative_means (Series) – Relative mean expression for all genes on every lineage tree branch.
Returns:

maxes – An array with the maximum relative expression of each gene along the lineage tree.

Return type:

numpy.ndarray

prosstt.sim_utils.pearson_between_programs(genes, prog1, prog2)

Calculate the pearson correlation coefficient between two expression programs for all genes.

Parameters:
  • genes (int) – The number of genes in the lineage tree
  • prog1 (numpy.ndarray) – The first expression program
  • prog2 (numpy.ndarray) – The second expression program
Returns:

pearson – The pearson correlation coefficient for all genes in the two programs

Return type:

numpy.ndarray

prosstt.sim_utils.pick_branch(tree, pseudotime, timezones, assignments)

Picks one of the possible branches for a cell at a given time point.

Parameters:
  • tree (Tree) – A lineage tree object.
  • pseudotime (int) – A pseudotime point.
  • timezones (int array) – The pseudotimes at which the timezones start and end.
  • assignments (int array) – A list of the possible branches for each timezone.
Returns:

branch – The branch to which the cell belongs.

Return type:

int

prosstt.sim_utils.pick_branches(tree, pseudotime)

Randomly pick a corresponding branch for a list of pseudotime values.

Parameters:
  • tree (Tree) – A lineage tree object.
  • pseudotime (list) – A list of pseudotime values.
Returns:

branches – Branch assignments for each pseudotime value.

Return type:

list

prosstt.sim_utils.print_progress(iteration, total, prefix='', suffix='', decimals=1)

Call in a loop to create a terminal-friendly text progress bar. Contributed by Greenstick on stackoverflow.com/questions/3173320.

Parameters:
  • iteration (int) – Current iteration.
  • total (int) – Total number of iterations.
  • prefix (str, optional) – Prefix string before the progress bar.
  • suffix (str, optional) – Suffix string after the progress bar.
  • decimals (int, optional) – Positive number of decimals in percent complete.
prosstt.sim_utils.process_timeseries_input(series_points, cells, point_std)

Process the input of sample_pseudotime_series to make everything the same shape.

Parameters:
  • series_points (list) – The pseudotime sample points for the time series experiment
  • cells (int or list) – Either the total number of cells to be sampled (in which case it is split equally among all sample points) or the number of cells to be sampled at each sample point
  • point_std (float, list) – Standard deviation of cell density around each sample point. If it is a float, then it is the same for every sample point
Returns:

  • series_points (numpy.ndarray) – The pseudotime sample points for the time series experiment
  • cells (numpy.ndarray) – The cells to be sampled at each sample point of the time series experiment
  • point_std (numpy.ndarray) – The cell density at each sample point of the time series experiment

prosstt.sim_utils.random_partition(k, iterable)

Random partition in almost equisized groups.

Parameters:
  • k (int) – How many partitions to create.
  • iterable (array) – The iterable to be partitioned.
Returns:

  • results (list of int lists.)
  • contributed by kennytm on stackoverflow.com/questions/3760752

prosstt.sim_utils.simulate_base_gene_exp(tree, relative_means, abs_max=5000, gene_mean=0.8, gene_std=1)

Samples appropriate base expression values for each gene. The criterion applied is that the absolute average gene expression does not surpass a certain threshold.

Parameters:
  • tree (Tree) – A lineage tree object.
  • relative_means (Series) – Relative mean expression for all genes on every lineage tree branch
  • abs_max (int, optional) – Highest allowed value for the absolute average expression of a gene along the lineage tree
  • gene_mean (float, optional) – Average of the log-normal distribution from which the base gene expression values are sampled
  • gene_std (float, optional) – Standard deviation of the log-normal distribution from which the base gene expression values are sampled
Returns:

base_gene_exp – An array that contains base expression values for each gene

Return type:

numpy.ndarray

prosstt.sim_utils.test_correlation(W, k, cutoff)

For a column of a matrix, test if previous columns correlate with it.

Parameters:
  • W (numpy array) – The matrix to test.
  • k (int) – Compare columns from 0 to k-1 with column k.
  • cutoff (float) – Correlation above the cut-off will be considered too much. Should be between 0 and 1 but is not explicitly tested.