Generate tree

The aim of this step is to generate a Tree object. For illustration purposes, we will try to create the lineage tree for a differentiation with a single cell fate decision (single bifurcation) where one side of the differentiation lasts longer than the other. We want the simulated data to have a total of 1000 genes and be regulated by 15 expression programs:

          100
   70   ---B------
---A---|
        ---C--
          60

A single bifurcation, where branch A is connected to branches B and C. The pseudotime lengths of the branches are A:70, B:100 and C:60, respectively. No part of the lineage tree is populated more densely than others - that means that if we picked a cell from the tree in a random manner, all positions are equally probable.

This can happen in various ways:

Manual definition

The Tree() constructor can be called directly with a topology that describes branch connectivity, a pseudotime length for each branch, the total number of branches and branch points, as well as values for the number of genes and expression programs (modules) (see Simulate average gene expression along tree):

from prosstt.tree import Tree
t = Tree(topology=[["A", "B"], ["A", "C"]],
         time={"A":70, "B":100, "C":60},
         num_branches=3,
         branch_points=1,
         modules=15,
         G=1000,
         density=None,
         root="A")

If density=None is passed (or if no value for the parameter is given), PROSSTT will automatically calculate a uniform density for the lineage tree.

From a Newick-formatted string

A Newick-formatted string can be used instead; in this case PROSSTT will parse the topology from the Newick string. It is important to note that just like in the manual definition, the Newick tree describes branches and their connections:

import newick
from prosstt.tree import Tree

newick_string = "(B:100,C:60)A:70;"
newick_tree = newick.loads(newick_string)
t = Tree.from_newick(newick_tree, modules=15, genes=1000, density=None)

The topology, time, num_branches, branch_points, and root parameters can be parsed out of the newick tree. The same considerations about density as before apply.

Automatic tree generation

For higher order topologies (above 3 leaves in the tree) multiple topological arrangements are possible (also see Supplemental Material of the paper). As was the case in the MERLoT benchmark, sometimes users want to generate a tree topology with the number of bifurcations as the only input parameter. For example, for a lineage tree with 4 bifurcations (9 branches) where each branch has the same length, 50 pseudotime units:

from prosstt.tree import Tree
import string

branch_names = list(string.ascii_uppercase[:9])
time = {branch: 50 for branch in branch_names}
t = Tree.from_random_topology(branch_points=4, time=time, modules=15, genes=1000)