ChIP-seq specific utilities¶

Scripts and tools for ChIP-seq specific tasks.

calc_coverage_stats.pl: stats from a coverage file
convertFastq2Fasta.pl: convert consensus fastq to fasta format
CreateChIPalignFileFromBed.pl: convert csfasta->BED for GLITR
getRandomTags_index.pl, getRandomTags_index_fastq.pl: extract random subsets of reads
make_macs2_xls.py: convert a MACS output file into an Excel spreadsheet
mean_coverage.pl: mean depth of read coverage from BAM file
run_DESeq.R

calc_coverage_stats.pl¶

Get stats for coverage using a coverage file from genomeCoverageBed -d

Format of the input is:

chr position count

Outputs mean and median for all positions including 0 count positions

NB requires perl Statistics::Descriptive module

convertFastq2Fasta.pl¶

Convert fastq formatted consensus from samtools pileup to fasta

Note

Note that this will be redundant as mpileup (the successor to pileup) has its own way of doing this. However it may be required for legacy projects.

Usage:

perl ~/ChIP_seq/convertFastq2Fasta.pl in.pileup.fq > out.fa

CreateChIPalignFileFromBed.pl¶

Convert csfasta->BED format file to ChIPalign format for GLITR peak caller.

Usage:

CreateChIPalignFileFromBed.pl in.bed out.align

getRandomTags_index.pl, getRandomTags_index_fastq.pl¶

Extract random subset of records from fasta and fastq sequence files.

getRandomTags_index.pl¶

Extract N random records from ChIP align fasta files (2-line records):

Usage:

getRandomTags_index.pl in.fasta N out.fasta

getRandomTags_index_fastq.pl¶

Extract N random records from fastq file (4-line records):

Usage:

getRandomTags_index_fastq.pl in.fastq N out.fastq

make_macs2_xls.py¶

Convert a MACS2 tab-delimited output file into an Excel (XLSX) spreadsheet.

Usage:

make_macs2_xls.py OPTIONS <macs_output_file>.xls [<xlsx_output_file>]

Options:

-f XLS_FORMAT, --format=XLS_FORMAT
                      specify the output Excel spreadsheet format; must be
                      one of 'xlsx' or 'xls' (default is 'xlsx')
-b, --bed             write an additional TSV file with chrom,
                      abs_summit+100 and abs_summit-100 data as the columns.
                      (NB only possible for MACS2 run without --broad)

If the xlsx_output_file isn’t specified then it defaults to XLS_<macs_output_file>.xlsx.

Note

To process output from MACS 1.4.2 and earlier use make_macs_xls.py; this version only supports .xls output and doesn’t provide either of the -f or -b options.

mean_coverage.pl¶

Mean depth of read coverage: calculates the average coverage of all the captured bases in a bam file and presents as a single number.

Originally posted by Michael James Clark on Biostar: http://biostar.stackexchange.com/questions/5181/tools-to-calculate-average-coverage-for-a-bam-file

Usage:

/path/to/samtools pileup in.bam | awk '{print $4}' | perl mean_coverage.pl

It can also be used for genomic regions:

/path/to/samtools view -b in.bam <genomic region> | /path/to/samtools pileup - | awk '{print $4}' | perl mean_coverage.pl

Note that this assumes every base is covered at least once (because samtools pileup doesn’t report bases with zero coverage).

run_DESeq.R¶

Usage:

runDESeq.R [input file] [generic figure label] [output file]

Run DESeq in R using a tab delimited file [input file] that has a column of chr_start_end called ‘regions’, and four columns of read counts for:

timeA_rep1 timeA_rep2 timeB_rep1 timeB_rep2

(‘conds’ order hard-wired).

A [generic figure label] adds specificity to the output diagrams (hard-wired). The final [output file] is created.