Object reference

CmdPipelineModule

class mockinbird.utils.pipeline.CmdPipelineModule(pipeline, keep_all=False, cfg_req=[])

Parent module for all modules that run shell commands

This class stores a list of commands in the _cmds variable and executes them on demand. It defines reasonable default behavior for all necessary pipeline methods.

Defining your own module requires following steps:

  • overwrite the constructor and pass the configuration requirements of the module
  • overwrite the prepare method and
    • add file paths to temporary files to _tmp_files
    • add intermediate file paths to _intermed_files
    • add shell commands to _cmds
  • use _pipeline to access the paths to previously generated files and register files as module output
cleanup(keep_intermed=False)

Cleans up temporary and intermediate files

Args:
keep_intermed (boolean): if set to True, intermediate files are not removed
config_req

The list of config requirements

default_config_keys

A set with the keys of pre-defined configuration options

execute()

Execute all queued commands

module_info

The module info string

prepare(cfg)

Prepares the module by queuing commands to execute

This is the main method that queues commands and lists files that are to be cleaned up. Subclasses should call the parent method before providing their own implementation.

Args:
cfg (dict): dictionary of the configuration options

Pipeline

class mockinbird.utils.pipeline.Pipeline(initial_files, general_cfg, cfg_path)

The Pipeline queues modules and provides means for modules to communicate

cfg_path

path to the config file

cleanup()

Clean up files of all modules

This method invokes the cleanup method of all queued modules. Intermediate files of the last module are not cleaned up

get_config(cfg_name)

Get config section from global config dictionary

Args:
cfg_name (str): name of the config section to retrieve
get_curfile(fmt)

Get the path to the most recently created file of format fmt

Args:
fmt (str): file format of file to retrieve

Exits the program with exit-code 1 if the file was not yet queued

has_curfile(fmt)

Check if a file of format fmt is already available in the pipeline

schedule(module)

Append module to the list of schedule modules

upd_curfile(fmt, filepath)

register filepath as the most recently created file of format fmt

Configuration

class mockinbird.utils.config_validation.Annot(type=<function Annot.<lambda>>, default=None, converter=<function Annot.<lambda>>, warn_if_missing=False)

Annot provides detailed information on a configuration option

It stores following properties:

type (callable): initial conversion of raw value from the config file; mostly obsolete now

default: default value if option is not explicitly set

converter (callable): function to convert and validate config option. Can raise ValueError if invalid value is passed

warn_if_missing: print a warning message if option is not provided by the user

converter

Function to convert and validate the given configuration value

default

Default value if configuration is not provided by the user

The default value None makes providing the config value mandatory.

type

Parse function to convert the

Note: starting from the introduction of the yaml config files, this function should be obsolete.

warn_if_missing

Print a warning if the config option is not provided and falls back to the default

mockinbird.utils.config_validation.boolean(bool_str)

Converts a string to bool

Only False, ‘no’, ‘0’ and ‘’ are interpreted as False, all other inputs are converted to True

mockinbird.utils.config_validation.comma_sep_args(item_str)

Split the input string after the comma delimiter into a list

mockinbird.utils.config_validation.dnanuc_validator(dna_nuc)

Validates that the character is one of the four bases ‘A’, ‘C’, ‘G’ and ‘T’

mockinbird.utils.config_validation.dnastr_validator(dna_string)

Validates that a string contains only the bases ‘A’, ‘C’, ‘G’ and ‘T’

Uracil (‘U’) letters are converted to the DNA equivalent Thymin (‘T’)

mockinbird.utils.config_validation.file_r_validator(path)

Validates a file and assures read permissions

mockinbird.utils.config_validation.file_rw_validator(path)

Validates a file and assures read and write permissions

mockinbird.utils.config_validation.id_converter(x)

Return the input. Equivalent to lambda x: x

mockinbird.utils.config_validation.in_set_validator(item, item_set)

Validates item is a member of the set item_set

mockinbird.utils.config_validation.is_subset_validator(item_str, item_set)

Validates item_str is a subset of the set item_set

item_str is a comma separated list of items.

mockinbird.utils.config_validation.nonneg_integer(integer)

Validates that the input is a non-negative integer

mockinbird.utils.config_validation.rel_file_r_validator(path, cfg_path)

Validates a file add assures read permissions

The path can either be absolute or relative to the parent folder of cfg_path

mockinbird.utils.config_validation.rel_file_rw_validator(path, cfg_path)

Validates a file and assures read and write permissions

The path can either be absolute or relative to the parent folder of cfg_path

mockinbird.utils.config_validation.rel_genome_validator(path, cfg_path)

Validates a genome and assures read permissions. Asserts the presence of a fasta index

The path can either be absolute or relative to the parent folder of cfg_path. The fasta index can be created by samtools faidx </path/to/file.fasta>. The fasta index has to have the same name and end with .fai.

mockinbird.utils.config_validation.rel_mapindex_validator(genome_index, cfg_path)

Validates a genome index

The path can either be absolute or relative to the parent folder of cfg_path