bcftbx.ngsutils

ngsutils

Utility classes and functions specific to NGS applications.

Extracting reads from Fastq, cfasta and qual files:

  • getreads: fetch reads one-by-one from Fastq, cfasta or qual file

  • getreads_subset: fetch subset of reads specified by index

  • getreads_regexp: fetch subset of reads matching regular expression

Extracting reads from Fastq, cfasta and qual files

bcftbx.ngsutils.getreads(filen)

Return Fastq, csfasta or qual file reads one-by-one

This generator function iterates through a sequence file (Fastq, csfasta or qual), and yields read records one at a time. The read records are returned as lists of lines.

The file can be gzipped; this function should handle this invisibly provided that the file extension is ‘.gz’.

Lines starting with ‘#’ at the start of the file will be treated as comments and ignored. Lines starting with ‘#’ which occur in the body of the file (i.e. after one or more lines of data) will be treated as data.

Example usage:

>>> for r in getreads('illumina_R1.fq'):
>>> ... print(r)
Parameters:

filen (str) – path of the file to fetch reads from

Yields:

List

next read record from the file, as a list

of lines.

bcftbx.ngsutils.getreads_subset(filen, indices)

Fetch subset of reads from Fastq, csfasta or qual file

This generator function iterates through a sequence file (Fastq, csfasta or qual), and yields a subset of the read records which are referenced by the supplied iterable indices.

The subset compromises of reads at the index positions specified by the list of indices, with index 0 being the first read in the file. Each read is returned as a list of lines.

The file can be gzipped; this function should handle this invisibly provided that the file extension is ‘.gz’.

Example usage (returns 1st, 3rd and 5th reads only):

>>> for r in getreads_subset('illumina_R1.fq',(0,2,4)):
>>> ... print(r)
Parameters:
  • filen (str) – path of the file to fetch reads from

  • indices (list) – list of read indices to return

Yields:

List

next read record from the file, as a list

of lines.

bcftbx.ngsutils.getreads_regex(filen, pattern)

Fetch matching reads from Fastq, csfasta or qual file

This generator function iterates through a sequence file (Fastq, csfasta or qual), and yields a subset of read records. Each read is returned as a list of lines.

The subset compromises of reads which match the supplied regular expression.

The file can be gzipped; this function should handle this invisibly provided that the file extension is ‘.gz’.

Example usage:

>>> for r in getreads_regexp('illumina_R1.fq',"2102:3130"):
>>> ... print(r)
Parameters:
  • filen (str) – path of the file to fetch reads from

  • pattern (list) – Python regular expression pattern

Yields:

List

next read record from the file, as a list

of lines.