bcftbx.Experiment

Experiment.py

The Experiment module provides two classes: the Experiment class defines a single experiment (essentially a collection of one or more related primary data sets) from a SOLiD run; the ExperimentList class is a collection of experiments which are typically part of the same SOLiD run.

class bcftbx.Experiment.Experiment

Class defining an experiment from a SOLiD run.

An ‘experiment’ is a collection of related data.

copy()

Return a new Experiment instance which is a copy of this one.

describe()

Describe the experiment as a set of command line options

dirname(top_dir=None)

Return directory name for experiment

The directory name is the supplied name plus the experiment type joined by an underscore, unless no type was specified (in which case it is just the experiment name).

If top_dir is also supplied then this will be prepended to the returned directory name.

class bcftbx.Experiment.ExperimentList(solid_run_dir=None)

Container for a collection of Experiments

Experiments are created and added to the ExperimentList by calling the addExperiment method, which returns a new Experiment object.

The calling subprogram then populates the Experiment properties as appropriate.

Once all Experiments are defined the analysis directory can be constructed by calling the buildAnalysisDirs method, which creates directories and symbolic links to primary data according to the definition of each experiment.

addDuplicateExperiment(expt)

Duplicate an existing Experiment and add to the list

Parameters:

expt – an existing Experiment object

Returns:

New Experiment object with the same data as the input

addExperiment(name)

Create a new Experiment and add to the list

Parameters:

name – the name of the new experiment

Returns:

New Experiment object with name already set

buildAnalysisDirs(top_dir=None, dry_run=False, link_type='relative', naming_scheme='partial')

Construct and populate analysis directories for the experiments

For each defined experiment, create the required analysis directories and populate with links to the primary data files.

Parameters:
  • top_dir – if set then create the analysis directories as subdirs of the specified directory; otherwise operate in cwd

  • dry_run – if True then only report the mkdir, ln etc operations that would be performed. Default is False (do perform the operations).

  • link_type – type of link to use when linking to primary data, one of ‘relative’ or ‘absolute’.

  • naming_scheme – naming scheme to use for links to primary data, one of ‘full’ (same names as primary data files), ‘partial’ (cut-down version of the full name which excludes sample names - the default), or ‘minimal’ (just the library name).

getLastExperiment()

Return the last Experiment added to the list

class bcftbx.Experiment.LinkNames(scheme)

Class to construct names for links to primary data files

The LinkNames class encodes a set of naming schemes that are used to construct names for the links in the analysis directories that point to the primary CFASTA and QUAL data files.

The schemes are:

full: link name is the same as the source file, e.g.

solid0123_20111014_FRAG_BC_AB_CD_EF_pool_F3_CD_PQ5.csfasta

partial: link name consists of the instrument name, datestamp and

library name, e.g. solid0123_20111014_CD_PQ5.csfasta

minimal: link name consists of just the library name, e.g.

CD_PQ5.csfasta

For paired-end data, the ‘partial’ and ‘minimal’ names have ‘_F3’ and ‘_F5’ appended as appropriate (full names already have this distinction).

Example usage:

To get the link names using the minimal scheme for the F3 reads (‘library’ is a SolidLibrary object):

>>> csfasta_lnk,qual_lnk = LinkNames('minimal').names(library)

To get names for the F5 reads using the partial scheme:

>>> csfasta_lnk,qual_lnk = LinkNames('partial').names(library,F5=True)
names(library, F5=False)

Get names for links to the primary data in a library

Returns a tuple of link names:

(csfasta_link_name,qual_link_name)

derived from the data in the library plus the naming scheme specified when the LinkNames object was created.

Parameters:
  • library – SolidLibrary object

  • F5 – if True then indicates that names should be returned for linking to the F5 reads (default is F3 reads)