Fastq manipulation
Extract subsets of reads from Fastq files
The extract_reads.py utility extracts subsets of reads from each of the supplied Fastq files according to specified criteria (either a random subset of a specified number reads, or readings matching a specified pattern).
Multiple files are assumed to be pairs (e.g. R1/R2 Fastqs) or groups (R1/I1/R2 Fastqs), with the same number of read records. The same subset will be extracted from each file, so that pairing/grouping is preserved.
Note
Input files can be any mixture of Fastq (.fastq
, .fq
),
or CSFASTA (.csfasta
) and QUAL (.qual
) files.
Split multi-lane Fastq into individual lanes
Given a multi-lane Fastq file (that is, a Fastq file containing reads for several different sequencer lanes), the split_fastq.py utility splits that data into multiple output Fastqs where each file only contains reads from a single lane.
Verify that Fastq files are paired
The verify_paired.py utility verifies that two Fastqs form an R1/R2 pair, by checking that read headers for corresponding records from the input Fastq files are in agreement.