randsample
Overview
The randsample
command is part of the MACS3 suite of tools and is
used to randomly sample a certain number or percentage of tags from
alignment files. This can be useful in ChIP-Seq analysis when a
subset of the data is required for downstream analysis.
Detailed Description
The randsample
command takes in one or multiple input alignment
files and produces an output file with the randomly sampled tags. It
will randomly sample the tags, according to setting for percentage or
for total number of tags to be kept.
When -p 100
is used, which means we want to keep all reads, the
randsample
command can be used to convert any format MACS3 supported
to BED (or BEDPE if the input is BAMPE format) format. It can generate
the same result as filterdup --keep-dup all
to convert other formats
into BED/BEDPE format.
Please note that, when writing BED format output for single-end
dataset, MACS3 assume all the reads having the same length either from
-s
setting or from auto-detection.
Command Line Options
Here is a brief overview of the randsample
options:
-i
or--ifile
: Alignment file. If multiple files are given as ‘-t A B C’, then they will all be read and combined. REQUIRED.-p
or--percentage
: Percentage of tags you want to keep. Input 80.0 for 80%. This option can’t be used at the same time with -n/–num. If the setting is 100, it will keep all the reads and convert any format that MACS3 supports into BED or BEDPE (if input is BAMPE) format. REQUIRED-n
or--number
: Number of tags you want to keep. Input 8000000 or 8e+6 for 8 million. This option can’t be used at the same time with -p/–percent. Note that the number of tags in the output is approximate as the number specified here. REQUIRED--seed
: Set the random seed while downsampling data. Must be a non-negative integer in order to be effective. If you want more reproducible results, please specify a random seed and record it. DEFAULT: not set-o
or--ofile
: Output BED file name. If not specified, will write to standard output. Note, if the input format is BAMPE or BEDPE, the output will be in BEDPE format. DEFAULT: stdout--outdir
: If specified, all output files will be written to that directory. Default: the current working directory-s
or--tsize
: Tag size. This will override the auto-detected tag size. DEFAULT: Not set-f
or--format
: Format of the tag file.AUTO
: MACS3 will pick a format from “AUTO”, “BED”, “ELAND”, “ELANDMULTI”, “ELANDEXPORT”, “SAM”, “BAM”, “BOWTIE”, “BAMPE”, and “BEDPE”. Please check the definition in the README file if you choose ELAND/ELANDMULTI/ELANDEXPORT/SAM/BAM/BOWTIE or BAMPE/BEDPE. DEFAULT: “AUTO”
--buffer-size
: Buffer size for incrementally increasing the internal array size to store read alignment information. In most cases, you don’t have to change this parameter. However, if there are a large number of chromosomes/contigs/scaffolds in your alignment, it’s recommended to specify a smaller buffer size in order to decrease memory usage (but it will take longer time to read alignment files). Minimum memory requested for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE * 8 Bytes. DEFAULT: 100000--verbose
: Set the verbose level. 0: only show critical messages, 1: show additional warning messages, 2: show process information, 3: show debug messages. If you want to know where the duplicate reads are, use 3. DEFAULT: 2
Example Usage
Here is an example of how to use the randsample
command:
macs3 randsample -i treatment.bam -o sampled.bed -f BAM -p 10
In this example, the program will randomly sample 10 percent of total
tags from the treatment.bam
file and write the result to
sampled.bed
.