bcftbx.qc

bcftbx.qc.report

Utilities for generating reports for NGS QC pipeline runs.

class bcftbx.qc.report.IlluminaQCReporter(dirn, data_format=None, qc_dir='qc', regex_pattern=None, version=None)

Class for reporting QC run on Illumina data

IlluminaQCReporter assembles the data associated with a QC run for a set of Illumina data and generates a HTML document which summarises the results for quick review.

report()

Write the HTML report

Writes a HTML document ‘qc_report.html’ to the top-level analysis directory.

zip()

Make a zip file containing the report and the images

Generate the ‘qc_report.html’ file and make a zip file ‘qc_report.<run>.<name>.zip’ which contains the report plus the associated image files, which can be unpacked elsewhere for viewing.

Returns:

Name of the zip file with the report.

class bcftbx.qc.report.IlluminaQCSample(name, qc_dir, fastq=None)

Class for holding QC data for an Illumina sample

An Illumina QC run typically consists of contamination screens and output from FastQC.

property is_empty

Return True if the sample has no reads, False otherwise

report(html)

Write HTML report for this sample

verify()

Check QC products for this sample

Checks that fastq_screens and FastQC files were found. Returns True if the QC products are present and False otherwise.

class bcftbx.qc.report.QCReporter(dirn, data_format=None, qc_dir='qc', regex_pattern=None, version=None)

Base class for reporting QC runs

This is a general class for reporting runs of the FLS NGS QC pipelines. QC reporters specific to particular pipelines should be subclassed from QCReporter and need to implement the ‘report’ method to generate the HTML output.

addSample(sample)

Add a QCSample class or subclass to the sample list

property data_format

Return the format for the primary data files

property dirn

Return top-level directory containing data

getPrimaryDataFiles()

Return list of primary data file sets

Returns a list of primary data file names; use the ‘primary_data_dir’ property to get the directory where the files are actually located.

property html

Return HTMLPageWriter instance for the report

property name

Return name of experiment

property primary_data_dir

Return location of primary data files

property qc_dir

Return directory holding QC outputs

report()

Generate a HTML report

This method must be implemented by the subclass.

property report_base_name

Return the base name for the report

property report_name

Return the full name for the report

property run

Return name of run

property samples

Return list of samples

verify()

Check that the QC outputs are correct

Returns True if the QC appears to have run successfully, False if not.

zip()

Make a zip file containing the report and the images

Generate the ‘qc_report.html’ file and make a zip file ‘qc_report.<run>.<name>.zip’ which contains the report plus the associated image files etc. The archive can then be unpacked elsewhere for viewing.

Returns:

Name of the zip file with the report.

exception bcftbx.qc.report.QCReporterError

Base class for errors with QCReporter-related code

class bcftbx.qc.report.QCSample(name, qc_dir)

Base class for reporting QC for a single sample

This is a general class for reporting the QC outputs associated with a single sample. It attempts to find all possible associated QC products for the given sample name.

Specific pipelines should subclass QCSample and implement the ‘report’ method, which can call the ‘report_*’ methods to produce HTML code specific to the pipeline in question.

addBoxplot(boxplot)

Associate a boxplot with the sample

Parameters:

boxplot – boxplot file name

addFastQC(fastqc_dir)

Associate a FastQC output directory with the sample

addProgramInfo(programs)

Collect program information from ‘programs’ file

addScreen(screen)

Associate a fastq_screen with the sample

Parameters:

screen – fastq_screen file name

boxplots()

Return list of boxplots for a sample

property fastqc

Return name of FastQC run dir

property programs

Return data on programs

report()

Generate a HTML report

This method must be implemented by the subclass.

report_boxplots(html, paired_end=False, inline_pngs=True)

Write HTML code reporting the boxplots

Parameters:
  • html – HTMLPageWriter instance to add the generated HTML to

  • inline_pngs – if set True then embed the PNG images as base64 encoded data; otherwise link to the original image file

report_fastqc(html, inline_pngs=True)

Write HTML code reporting the results from FastQC

Parameters:

html – HTMLPageWriter instance to add the generated HTML to

report_programs(html)

Write HTML code reporting the program information

report_screens(html, inline_pngs=True)

Write HTML code reporting the fastq screens

Parameters:
  • html – HTMLPageWriter instance to add the generated HTML to

  • inline_pngs – if set True then embed the PNG images as base64 encoded data; otherwise link to the original image file

screens()

Return list of screens for a sample

verify()

Verify expected QC products for the sample

This method must be implemented by the subclass. It should return True if the QC appears to have run successfully for the sample, False if not.

zip_includes()

Return list of files and directories to archive

class bcftbx.qc.report.SolidQCReporter(dirn, data_format=None, qc_dir='qc', regex_pattern=None, version=None)

Class for reporting QC run on SOLiD data

SolidQCReporter assembles the data associated with a QC run for a set of SOLiD data and generates a HTML document which summarises the results for quick review.

report()

Write the HTML report

Writes a HTML document ‘qc_report.html’ to the top-level analysis directory.

verify()

Verify that SOLiD QC completed successfully for all samples

Returns True if the QC appears to have run successfully, False if not.

class bcftbx.qc.report.SolidQCSample(name, qc_dir, paired_end)

Class for holding QC data for a SOLiD sample

A SOLiD QC run typically consists of filtered and unfiltered boxplots, quality filtering stats, and contamination screens.

report(html)

Write HTML report for this sample

verify()

Check QC products for this sample

Checks that fastq_screens and boxplots were found. Returns True if the QC products are present and False otherwise.

bcftbx.qc.report.add_dir_to_zip(z, dirn, zip_top_dir=None)

Recursively add a directory and its contents to a zip archive

z is a zipfile.ZipFile object already opened for writing; this function adds all files in directory dirn and its subdirectories to z.

If zip_top_dir is not None then this is prepended to the file name written to the zip archive.

bcftbx.qc.report.count_reads(csfasta_file)

Count the number of reads in a CSFASTA file

Returns number of reads, or None

bcftbx.qc.report.is_boxplot(name, f)

Return True if f is a qc_boxplot associated with sample

‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.

bcftbx.qc.report.is_fastq_screen(name, f)

Return True if f is a fastq_screen file associated with name

‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.

bcftbx.qc.report.is_fastqc(name, f)

Return True if f is a FastQC file associated with name

‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.

bcftbx.qc.report.is_program_info(name, f)

Return True if f is a ‘program info’ file associated with name

‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.

bcftbx.qc.report.split_sample_name(name)

Split name into leading part plus trailing number

Returns (start,number)

bcftbx.qc.report.strip_ngs_extensions(name)

Remove fastq, fastq, csfasta or qual extensions from name