bcftbx.FASTQFile

Legacy module providing a set of classes for reading through FASTQ files and manipulating the data within them.

The core functionality has been reimplemented in the io.fastq module; the classes and functions in this module are now deprecated and only maintained for backwards compatibility; they will be removed in a future release.

The legacy classes and functions are:

  • FastqIterator: enables looping through all read records in FASTQ file

  • FastqRead: provides access to a single FASTQ read record

  • SequenceIdentifier: provides access to sequence identifier info in a read

  • FastqAttributes: provides access to gross attributes of FASTQ file

  • get_fastq_file_handle: return a file handled opened for reading a FASTQ file

  • nreads: return the number of reads in a FASTQ file

  • fastqs_are_pair: check whether two FASTQs form an R1/R2 pair

Note

The FastqAttributes class has not been reimplemented in io.fastq; it should no longer be used.

class bcftbx.FASTQFile.FastqAttributes(fastq_file=None, fp=None)

Class to provide access to gross attributes of a FASTQ file

Deprecated legacy class; do not use as it will be removed in a future release.

Given a FASTQ file (can be uncompressed or gzipped), enables various attributes to be queried via the following properties:

nreads: number of reads in the FASTQ file fsize: size of the file (in bytes)

property fsize

Return size of the FASTQ file (bytes)

property nreads

Return number of reads in the FASTQ file

class bcftbx.FASTQFile.FastqIterator(fastq_file=None, fp=None, bufsize=None)

Iterator for looping over all records in a FASTQ file.

Deprecated legacy class: use FastqIterator from io.fastq instead.

Parameters:
  • fastq_file (str) – name of the FASTQ file to iterate through

  • fp (any) – file-like object opened for reading

  • bufsize (int) – optional integer specifying number of bytes to read as a single ‘chunk’ from disk

class bcftbx.FASTQFile.FastqRead(seqid_line=None, seq_line=None, optid_line=None, quality_line=None)

Class to store a FASTQ record with information about a read

Deprecated legacy class: use FastqRead from io.fastq instead.

Provides the following properties for accessing the read data:

  • seqid: the “sequence identifier” information (first line of the read record) as a SequenceIdentifier object

  • sequence: the raw sequence (second line of the record)

  • optid: the optional sequence identifier line (third line of the record)

  • quality: the quality values (fourth line of the record)

Additional properties:

  • raw_seqid: the original sequence identifier string supplied when the object was created

  • seqlen: length of the sequence

  • maxquality: maximum quality value (in character representation)

  • minquality: minimum quality value (in character representation)

  • is_colorspace: returns True if the read looks like a colorspace read, False otherwise

class bcftbx.FASTQFile.SequenceIdentifier(seqid)

Class to store/manipulate sequence identifier information from a FASTQ record

Deprecated legacy class: use SequenceIdentifier from io.fastq instead.

Provides access to the data items in the sequence identifier line of a FASTQ record.