bcftbx.qc
bcftbx.qc.report
Utilities for generating reports for NGS QC pipeline runs.
- class bcftbx.qc.report.IlluminaQCReporter(dirn, data_format=None, qc_dir='qc', regex_pattern=None, version=None)
Class for reporting QC run on Illumina data
IlluminaQCReporter assembles the data associated with a QC run for a set of Illumina data and generates a HTML document which summarises the results for quick review.
- report()
Write the HTML report
Writes a HTML document ‘qc_report.html’ to the top-level analysis directory.
- zip()
Make a zip file containing the report and the images
Generate the ‘qc_report.html’ file and make a zip file ‘qc_report.<run>.<name>.zip’ which contains the report plus the associated image files, which can be unpacked elsewhere for viewing.
- Returns:
Name of the zip file with the report.
- class bcftbx.qc.report.IlluminaQCSample(name, qc_dir, fastq=None)
Class for holding QC data for an Illumina sample
An Illumina QC run typically consists of contamination screens and output from FastQC.
- property is_empty
Return True if the sample has no reads, False otherwise
- report(html)
Write HTML report for this sample
- verify()
Check QC products for this sample
Checks that fastq_screens and FastQC files were found. Returns True if the QC products are present and False otherwise.
- class bcftbx.qc.report.QCReporter(dirn, data_format=None, qc_dir='qc', regex_pattern=None, version=None)
Base class for reporting QC runs
This is a general class for reporting runs of the FLS NGS QC pipelines. QC reporters specific to particular pipelines should be subclassed from QCReporter and need to implement the ‘report’ method to generate the HTML output.
- addSample(sample)
Add a QCSample class or subclass to the sample list
- property data_format
Return the format for the primary data files
- property dirn
Return top-level directory containing data
- getPrimaryDataFiles()
Return list of primary data file sets
Returns a list of primary data file names; use the ‘primary_data_dir’ property to get the directory where the files are actually located.
- property html
Return HTMLPageWriter instance for the report
- property name
Return name of experiment
- property primary_data_dir
Return location of primary data files
- property qc_dir
Return directory holding QC outputs
- report()
Generate a HTML report
This method must be implemented by the subclass.
- property report_base_name
Return the base name for the report
- property report_name
Return the full name for the report
- property run
Return name of run
- property samples
Return list of samples
- verify()
Check that the QC outputs are correct
Returns True if the QC appears to have run successfully, False if not.
- zip()
Make a zip file containing the report and the images
Generate the ‘qc_report.html’ file and make a zip file ‘qc_report.<run>.<name>.zip’ which contains the report plus the associated image files etc. The archive can then be unpacked elsewhere for viewing.
- Returns:
Name of the zip file with the report.
- exception bcftbx.qc.report.QCReporterError
Base class for errors with QCReporter-related code
- class bcftbx.qc.report.QCSample(name, qc_dir)
Base class for reporting QC for a single sample
This is a general class for reporting the QC outputs associated with a single sample. It attempts to find all possible associated QC products for the given sample name.
Specific pipelines should subclass QCSample and implement the ‘report’ method, which can call the ‘report_*’ methods to produce HTML code specific to the pipeline in question.
- addBoxplot(boxplot)
Associate a boxplot with the sample
- Parameters:
boxplot – boxplot file name
- addFastQC(fastqc_dir)
Associate a FastQC output directory with the sample
- addProgramInfo(programs)
Collect program information from ‘programs’ file
- addScreen(screen)
Associate a fastq_screen with the sample
- Parameters:
screen – fastq_screen file name
- boxplots()
Return list of boxplots for a sample
- property fastqc
Return name of FastQC run dir
- property programs
Return data on programs
- report()
Generate a HTML report
This method must be implemented by the subclass.
- report_boxplots(html, paired_end=False, inline_pngs=True)
Write HTML code reporting the boxplots
- Parameters:
html – HTMLPageWriter instance to add the generated HTML to
inline_pngs – if set True then embed the PNG images as base64 encoded data; otherwise link to the original image file
- report_fastqc(html, inline_pngs=True)
Write HTML code reporting the results from FastQC
- Parameters:
html – HTMLPageWriter instance to add the generated HTML to
- report_programs(html)
Write HTML code reporting the program information
- report_screens(html, inline_pngs=True)
Write HTML code reporting the fastq screens
- Parameters:
html – HTMLPageWriter instance to add the generated HTML to
inline_pngs – if set True then embed the PNG images as base64 encoded data; otherwise link to the original image file
- screens()
Return list of screens for a sample
- verify()
Verify expected QC products for the sample
This method must be implemented by the subclass. It should return True if the QC appears to have run successfully for the sample, False if not.
- zip_includes()
Return list of files and directories to archive
- class bcftbx.qc.report.SolidQCReporter(dirn, data_format=None, qc_dir='qc', regex_pattern=None, version=None)
Class for reporting QC run on SOLiD data
SolidQCReporter assembles the data associated with a QC run for a set of SOLiD data and generates a HTML document which summarises the results for quick review.
- report()
Write the HTML report
Writes a HTML document ‘qc_report.html’ to the top-level analysis directory.
- verify()
Verify that SOLiD QC completed successfully for all samples
Returns True if the QC appears to have run successfully, False if not.
- class bcftbx.qc.report.SolidQCSample(name, qc_dir, paired_end)
Class for holding QC data for a SOLiD sample
A SOLiD QC run typically consists of filtered and unfiltered boxplots, quality filtering stats, and contamination screens.
- report(html)
Write HTML report for this sample
- verify()
Check QC products for this sample
Checks that fastq_screens and boxplots were found. Returns True if the QC products are present and False otherwise.
- bcftbx.qc.report.add_dir_to_zip(z, dirn, zip_top_dir=None)
Recursively add a directory and its contents to a zip archive
z is a zipfile.ZipFile object already opened for writing; this function adds all files in directory dirn and its subdirectories to z.
If zip_top_dir is not None then this is prepended to the file name written to the zip archive.
- bcftbx.qc.report.count_reads(csfasta_file)
Count the number of reads in a CSFASTA file
Returns number of reads, or None
- bcftbx.qc.report.is_boxplot(name, f)
Return True if f is a qc_boxplot associated with sample
‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.
- bcftbx.qc.report.is_fastq_screen(name, f)
Return True if f is a fastq_screen file associated with name
‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.
- bcftbx.qc.report.is_fastqc(name, f)
Return True if f is a FastQC file associated with name
‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.
- bcftbx.qc.report.is_program_info(name, f)
Return True if f is a ‘program info’ file associated with name
‘name’ can be a file name, or a file ‘root’ i.e. filename with all trailing extensions removed.
- bcftbx.qc.report.split_sample_name(name)
Split name into leading part plus trailing number
Returns (start,number)
- bcftbx.qc.report.strip_ngs_extensions(name)
Remove fastq, fastq, csfasta or qual extensions from name