bcftbx.ngsutils
ngsutils
Utility classes and functions specific to NGS applications.
Extracting reads from Fastq, cfasta and qual files:
getreads: fetch reads one-by-one from Fastq, cfasta or qual file
getreads_subset: fetch subset of reads specified by index
getreads_regexp: fetch subset of reads matching regular expression
Extracting reads from Fastq, cfasta and qual files
- bcftbx.ngsutils.getreads(filen)
Return Fastq, csfasta or qual file reads one-by-one
This generator function iterates through a sequence file (Fastq, csfasta or qual), and yields read records one at a time. The read records are returned as lists of lines.
The file can be gzipped; this function should handle this invisibly provided that the file extension is ‘.gz’.
Lines starting with ‘#’ at the start of the file will be treated as comments and ignored. Lines starting with ‘#’ which occur in the body of the file (i.e. after one or more lines of data) will be treated as data.
Example usage:
>>> for r in getreads('illumina_R1.fq'): >>> ... print(r)
- Parameters:
filen (str) – path of the file to fetch reads from
- Yields:
List –
- next read record from the file, as a list
of lines.
- bcftbx.ngsutils.getreads_subset(filen, indices)
Fetch subset of reads from Fastq, csfasta or qual file
This generator function iterates through a sequence file (Fastq, csfasta or qual), and yields a subset of the read records which are referenced by the supplied iterable indices.
The subset compromises of reads at the index positions specified by the list of indices, with index 0 being the first read in the file. Each read is returned as a list of lines.
The file can be gzipped; this function should handle this invisibly provided that the file extension is ‘.gz’.
Example usage (returns 1st, 3rd and 5th reads only):
>>> for r in getreads_subset('illumina_R1.fq',(0,2,4)): >>> ... print(r)
- Parameters:
filen (str) – path of the file to fetch reads from
indices (list) – list of read indices to return
- Yields:
List –
- next read record from the file, as a list
of lines.
- bcftbx.ngsutils.getreads_regex(filen, pattern)
Fetch matching reads from Fastq, csfasta or qual file
This generator function iterates through a sequence file (Fastq, csfasta or qual), and yields a subset of read records. Each read is returned as a list of lines.
The subset compromises of reads which match the supplied regular expression.
The file can be gzipped; this function should handle this invisibly provided that the file extension is ‘.gz’.
Example usage:
>>> for r in getreads_regexp('illumina_R1.fq',"2102:3130"): >>> ... print(r)
- Parameters:
filen (str) – path of the file to fetch reads from
pattern (list) – Python regular expression pattern
- Yields:
List –
- next read record from the file, as a list
of lines.