Command reference ================= .. note:: This documentation has been auto-generated from the command help The following utilities are available: .. contents:: :local: .. _reference_analyse_solid_run: analyse_solid_run.py ******************** :: usage: analyse_solid_run.py [-h] [--version] [--only] [--report] [--report-paths] [--xls] [--verify] [--layout] [--rsync] [--copy COPY_PATTERN] [--gzip GZIP_PATTERN] [--md5 MD5_PATTERN] [--md5sum] [--no-warnings] [--debug] solid_run_dir [solid_run_dir ...] Utility for performing various checks and operations on SOLiD run directories. If a single solid_run_dir is specified then analyse_solid_run.py automatically finds and operates on all associated directories from the same instrument and with the same timestamp. positional arguments: solid_run_dir SOLiD run directory to operate on optional arguments: -h, --help show this help message and exit --version show program's version number and exit --only only operate on the specified solid_run_dir, don't locate associated run directories --report print a report of the SOLiD run --report-paths in report mode, also print full paths to primary data files --xls write report to Excel spreadsheet --verify do verification checks on SOLiD run directories --layout generate script for laying out analysis directories --rsync generate script for rsyncing data --copy COPY_PATTERN copy primary data files to pwd from specific library where names match COPY_PATTERN, which should be of the form '/' --gzip GZIP_PATTERN make gzipped copies of primary data files in pwd from specific libraries where names match GZIP_PATTERN, which should be of the form '/' --md5 MD5_PATTERN calculate md5sums for primary data files from specific libraries where names match MD5_PATTERN, which should be of the form '/' --md5sum calculate md5sums for all primary data files (equivalent to --md5=*/*) --no-warnings suppress warning messages --debug turn on debugging output (nb overrides --no-warnings) .. _reference_annotate_probesets: annotate_probesets.py ********************* :: usage: annotate_probesets.py [-h] [--version] [-o OUT_FILE] IN_FILE Annotate probeset list based on name: reads in first column of tab-delimited input file 'probe_set_file' as a list of probeset names and outputs these names to another tab-delimited file with a description for each. Output file name can be specified with the -o option, otherwise it will be the input file name with '_annotated' appended. positional arguments: IN_FILE input probeset file optional arguments: -h, --help show this help message and exit --version show program's version number and exit -o OUT_FILE specify output file name .. _reference_best_exons: best_exons.py ************* :: usage: best_exons.py [-h] [--version] [--rank-by {log2_fold_change,p_value}] [--probeset-col PROBESET_COL] [--gene-symbol-col GENE_SYMBOL_COL] [--log2-fold-change-col LOG2_FOLD_CHANGE_COL] [--p-value-col P_VALUE_COL] [--debug] EXONS_IN BEST_EXONS Read exon and gene symbol data from EXONS_IN and picks the top three exons for each gene symbol, then outputs averages of the associated values to BEST_EXONS. positional arguments: EXONS_IN input file with exon and gene symbol data BEST_EXONS output file averages from top three exons for eachgene symbol optional arguments: -h, --help show this help message and exit --version show program's version number and exit --rank-by {log2_fold_change,p_value} select the criterion for ranking the 'best' exons; possible options are: 'log2_fold_change' (default), or 'p_value'. --probeset-col PROBESET_COL specify column with probeset names (default=0, columns start counting from zero) --gene-symbol-col GENE_SYMBOL_COL specify column with gene symbols (default=1, columns start counting from zero) --log2-fold-change-col LOG2_FOLD_CHANGE_COL specify column with log2 fold change (default=12, columns start counting from zero) --p-value-col P_VALUE_COL specify column with p-value (default=13; columns start counting from zero) --debug Turn on debug output .. _reference_bowtie_mapping_stats: bowtie_mapping_stats.py *********************** :: usage: bowtie_mapping_stats.py [-h] [--version] [-o xls_file] [-t] BOWTIE_LOG_FILE [BOWTIE_LOG_FILE ...] Extract mapping statistics for each sample referenced in the input bowtie log files and summarise the data in an XLS spreadsheet. Handles output from both Bowtie and Bowtie2. positional arguments: BOWTIE_LOG_FILE logfile output from Bowtie or Bowtie2 optional arguments: -h, --help show this help message and exit --version show program's version number and exit -o xls_file specify name of the output XLS file (otherwise defaults to 'mapping_summary.xls'). -t write data to tab-delimited file in addition to the XLS file. The tab file will have the same name as the XLS file, with the extension replaced by .txt .. _reference_extract_reads: extract_reads.py **************** :: usage: extract_reads.py [-h] [--version] [-m PATTERN] [-n N] [-s SEED] infile [infile ...] Extract subsets of reads from each of the supplied files according to specified criteria (e.g. random, matching a pattern etc). Input files can be any mixture of FASTQ (.fastq, .fq), CSFASTA (.csfasta) and QUAL (.qual). positional arguments: infile input FASTQ, CSFASTA, or QUAL file optional arguments: -h, --help show this help message and exit --version show program's version number and exit -m PATTERN, --match PATTERN extract records that match Python regular expression PATTERN -n N extract N random reads from the input file(s). If multiple files are supplied (e.g. R1/R2 pair) then the same subsets will be extracted for each. (Optionally a percentage can be supplied instead e.g. '50%' to extract a subset of half the reads.) -s SEED, --seed SEED specify seed for random number generator (used for -n option; using the same seed should produce the same 'random' sample of reads) .. _reference_fastq_strand: fastq_strand.py *************** :: Fastq_strand: version 1.14.0 usage: fastq_strand.py [-h] [--version] [-g GENOMEDIR] [--subset SUBSET] [-o OUTDIR] [-c FILE] [-n N] [--counts] [--keep-star-output] READ1 [READ2] Generate strandedness statistics for FASTQ or FASTQpair, by running STAR using one or more genome indexes positional arguments: READ1 R1 Fastq file READ2 R2 Fastq file optional arguments: -h, --help show this help message and exit --version show program's version number and exit -g GENOMEDIR, --genome GENOMEDIR path to directory with STAR index for genome to use (use as an alternative to -c/--conf; can be specified multiple times to include additional genomes) --subset SUBSET use a random subset of read pairs from the input Fastqs; set to zero to use all reads (default: 10000) -o OUTDIR, --outdir OUTDIR specify directory to write final outputs to (default: current directory) -c FILE, --conf FILE specify delimited 'conf' file with list of NAME and STAR index directory pairs. NB if a conf file is supplied then any indices specifed on the command line will be ignored -n N number of threads to run STAR with (default: 1) --counts include the count sums for unstranded, 1st read strand aligned and 2nd read strand aligned in the output file (default: only include percentages) --keep-star-output keep the output from STAR (default: delete outputs on completion) .. _reference_log_seq_data: log_seq_data.sh *************** :: Usage: log_seq_data.sh [-d|-u] [] log_seq_data.sh -c [] log_seq_data.sh -i log_seq_data.sh -v Add, update or delete an entry for in , or verify entries. can be a primary data directory from a sequencer or a directory of derived data (e.g. analysis directory) By default an entry is added for the specified data directory; each entry is a tab-delimited line with the full path for the data directory followed by the UNIX timestamp and the optional description text. If doesn't exist then it will be created; if is already in the log file then it won't be added again. Options: -d deletes an existing entry -u update description for an existing entry (or creates a new one if an existing entry not found) -c changes an existing entry, updating the directory path and (optionally) the description -i print information about an entry -v validates the entries in the logging file. .. _reference_make_macs_xls: make_macs_xls.py **************** :: usage: make_macs_xls.py [-h] [--version] MACS_OUTPUT [XLS_OUT] Create an XLS spreadsheet from the output of the MACS peak caller. is the output '.xls' file from MACS; if supplied then is the name to use for the output file, otherwise it will be called 'XLS_.xls'. positional arguments: MACS_OUTPUT output .xls file from MACS XLS_OUT output MS XLS file (defaults to 'XLS_.xls'). optional arguments: -h, --help show this help message and exit --version show program's version number and exit .. _reference_make_macs2_xls: make_macs2_xls.py ***************** :: usage: make_macs2_xls.py [-h] [--version] [-f XLS_FORMAT] [-b] MACS2_XLS [XLS_OUT] Create an XLS(X) spreadsheet from the output of the MACS2 peak caller. MACS2_XLS is the output '.xls' file from MACS2; if supplied then XLS_OUT is the name to use for the output file (otherwise it will be called 'XLS_.xls(x)'). positional arguments: MACS2_XLS output '.xls' file from MACS2 XLS_OUT name to use for the output file (default is 'XLS_.xls(x)') optional arguments: -h, --help show this help message and exit --version show program's version number and exit -f XLS_FORMAT, --format XLS_FORMAT specify the output Excel spreadsheet format; must be one of 'xlsx' or 'xls' (default is 'xlsx') -b, --bed write an additional TSV file with chrom, abs_summit+100 and abs_summit-100 data as the columns. (NB only possible for MACS2 run without --broad) .. _reference_manage_seqs: manage_seqs.py ************** :: usage: manage_seqs.py [-h] [--version] [-o OUT_FILE] [-a APPEND_FILE] [-d DESCRIPTION] INFILE [INFILE ...] Read sequences and names from one or more INFILEs (which can be a mixture of FastQC 'contaminants' format and or Fasta format), check for redundancy (i.e. sequences with multiple associated names) and contradictions (i.e. names with multiple associated sequences). positional arguments: INFILE input sequences optional arguments: -h, --help show this help message and exit --version show program's version number and exit -o OUT_FILE write all sequences to OUT_FILE in FastQC 'contaminants' format -a APPEND_FILE append sequences to existing APPEND_FILE (not compatible with -o) -d DESCRIPTION supply arbitrary text to write to the header of the output file .. _reference_md5checker: md5checker.py ************* :: usage: md5checker.py -d SOURCE_DIR DEST_DIR md5checker.py -d FILE1 FILE2 md5checker.py [ -o CHKSUM_FILE ] DIR md5checker.py [ -o CHKSUM_FILE ] FILE md5checker.py -c CHKSUM_FILE Compute and verify MD5 checksums for files and directories. optional arguments: -h, --help show this help message and exit --version show program's version number and exit -d, --diff for two directories: check that contents of directory DIR1 are present in DIR2 and have the same MD5 sums; for two files: check that FILE1 and FILE2 have the same MD5 sums -c, --check read MD5 sums from the specified file and check them -q, --quiet suppress output messages and only report failures Directory comparison (-d, --diff): Check that the contents of SOURCE_DIR are present in TARGET_DIR and have matching MD5 sums. Note that files that are only present in TARGET_DIR are not reported. File comparison (-d, --diff): Check that FILE1 and FILE2 have matching MD5 sums. Checksum generation: MD5 checksums are calcuated for all files in the specified directory, or for a single specified file. -o CHKSUM_FILE, --output CHKSUM_FILE optionally write computed MD5 sums to CHKSUM_FILE (otherwise the sums are written to stdout). The output format is the same as that used by the Linux 'md5sum' tool. Checksum verification (-c, --check): Check MD5 sums for each of the files listed in the specified CHKSUM_FILE relative to the current directory. This option behaves the same as the Linux 'md5sum' tool. .. _reference_prep_sample_sheet: prep_sample_sheet.py ******************** :: usage: prep_sample_sheet.py [-h] [--version] [-o SAMPLESHEET_OUT] [-f FMT] [-V] [--fix-spaces] [--fix-duplicates] [--fix-empty-projects] [--set-id SAMPLE_ID] [--set-project SAMPLE_PROJECT] [--reverse-complement-i5] [--ignore-warnings] [--include-lanes LANES] [--set-adapter ADAPTER] [--set-adapter-read2 ADAPTER_READ2] [--truncate-barcodes BARCODE_LEN] [--miseq] SAMPLE_SHEET Utility to prepare SampleSheet files from Illumina sequencers. Can be used to view, validate and update or fix information such as sample IDs and project names before running BCL to FASTQ conversion. positional arguments: SAMPLE_SHEET input sample sheet file optional arguments: -h, --help show this help message and exit --version show program's version number and exit -o SAMPLESHEET_OUT output new sample sheet to SAMPLESHEET_OUT -f FMT, --format FMT specify the format of the output sample sheet written by the -o option; can be either 'CASAVA' or 'IEM' (defaults to the format of the original file) -V, --view view predicted outputs from sample sheet --fix-spaces replace spaces in sample ID and project fields with underscores --fix-duplicates append unique indices to sample IDs where the original ID and project name combination are duplicated --fix-empty-projects create sample project names where these are blank in the original sample sheet --set-id SAMPLE_ID update/set the values in sample ID field; SAMPLE_ID should be of the form ':', where is a single integer (e.g. 1), a set of integers (e.g. 1,3,...), a range (e.g. 1-3), or a combination (e.g. 1,3-5,7) --set-project SAMPLE_PROJECT update/set values in the sample project field; SAMPLE_PROJECT should be of the form '[:]', where the optional part can be a single integer (e.g. 1), a set of integers (e.g. 1,3,...), a range (e.g. 1-3), or a combination (e.g. 1,3-5,7). If no lanes are specified then all samples will have their project set to --reverse-complement-i5 replace i5 index sequences with their reverse complement --ignore-warnings ignore warnings about spaces and duplicated sampleID/sampleProject combinations when writing new samplesheet.csv file --include-lanes LANES specify a subset of lanes to include in the output sample sheet; LANES should be single integer (e.g. 1), a list of integers (e.g. 1,3,...), a range (e.g. 1-3) or a combination (e.g. 1,3-5,7). Default is to include all lanes --set-adapter ADAPTER set the adapter sequence in the 'Settings' section to ADAPTER --set-adapter-read2 ADAPTER_READ2 set the adapter sequence for read 2 in the 'Settings'section to ADAPTER_READ2 Deprecated options: --truncate-barcodes BARCODE_LEN trim barcode sequences in sample sheet to number of bases specified by BARCODE_LEN. Default is to leave barcode sequences unmodified (deprecated; only works for CASAVA-style sample sheets) --miseq convert input MiSEQ sample sheet to CASAVA-compatible format (deprecated; specify -f/--format CASAVA to convert IEM sample sheet to older format) .. _reference_reorder_fasta: reorder_fasta.py **************** :: usage: reorder_fasta.py [-h] [--version] FASTA Reorder the chromosome records in a FASTA file into karyotypic order. positional arguments: FASTA FASTA file to reorder optional arguments: -h, --help show this help message and exit --version show program's version number and exit .. _reference_sam2soap: sam2soap.py *********** :: usage: sam2soap.py [-h] [--version] [-o SOAPFILE] [--debug] [SAMFILE] Convert SAM file to SOAP format - reads from stdin (or SAMFILE, if specified), and writes output to stdout unless -o option is specified. positional arguments: SAMFILE SAM file to convert (or stdin if not specified) optional arguments: -h, --help show this help message and exit --version show program's version number and exit -o SOAPFILE Output SOAP file name --debug Turn on debugging output .. _reference_split_fasta: split_fasta.py ************** :: usage: split_fasta.py [-h] [--version] [fasta_file] Split input FASTA file with multiple sequences into multiple files each containing sequences for a single chromosome. positional arguments: fasta_file input FASTA file to split optional arguments: -h, --help show this help message and exit --version show program's version number and exit .. _reference_split_fastq: split_fastq.py ************** :: usage: split_fastq.py [-h] [--version] [-l LANES] FASTQ Split input Fastq file into multiple output Fastqs where each output only contains reads from a single lane. positional arguments: FASTQ Fastq to split optional arguments: -h, --help show this help message and exit --version show program's version number and exit -l LANES, --lanes LANES lanes to extract: can be a single integer, a comma- separated list (e.g. 1,3), a range (e.g. 5-7) or a combination (e.g. 1,3,5-7). Default is to extract all lanes in the Fastq .. _reference_verify_paired: verify_paired.py **************** :: usage: verify_paired.py [-h] [--version] R1.fastq R2.fastq Check that read headers for R1 and R2 fastq files are in agreement, and that the files form an R1/2 pair. positional arguments: R1.fastq Fastq file with R1 reads R2.fastq Fastq file with R2 reads to check against R1 reads optional arguments: -h, --help show this help message and exit --version show program's version number and exit .. _reference_xrorthologs: xrorthologs.py ************** :: usage: xrorthologs.py [-h] [--version] [--debug] LOOKUPFILE SPECIES1 SPECIES2 Cross-reference data from two species given a lookup file that maps probeset IDs from one species onto those onto the other. LOOKUPFILE is tab-delimited file with one probe set for species 1 per line in first column and a comma- separated list of the equivalent probe sets for species 2 in the fourth column. Data for the two species are in tab-delimited files SPECIES1 and SPECIES2. Output is two files: SPECIES1_appended.txt (SPECIES1 with the cross- referenced data from SPECIES2 appended to each line) and SPECIES2_appended.txt (SPECIES2 with SPECIES1 data appended). positional arguments: LOOKUPFILE tab-delimited file with one probe set for species 1 per line in first column and a comma-separated list of the equivalent probe sets for species 2 in the fourth column SPECIES1 data for species 1 SPECIES2 data for species 2 optional arguments: -h, --help show this help message and exit --version show program's version number and exit --debug Turn on debugging output