# predictd

## Overview
The `predictd` command is part of the MACS3 suite of tools and is used
to predict the expected DNA fragment size from alignment files. It
uses the cross-correlation method to find the best shift to correlate
the cutting ends on plus and minus strands. 

## Detailed Description

The `predictd` command takes an input bedGraph file and predicts *d*
or fragment size from alignment results. In case of paired-end data,
it will report the average insertion/fragment size from all
pairs. Note there will be no step for duplicate reads filtering or
sequencing depth scaling, so you may need to do certain
pre/post-processing, such as using `filterdup` or `randsample`
command.

If the alignment file is a single-end file, a model file (from
`--rfile`) will be saved which can be used to visualize the model in
PDF. And the command line output will tell the predicted *d* size in
the line of `predicted fragment length is` and alternative *d* sizes
in the line of `alternative fragment length(s) may be`. 

If the alignment file is a paired-end file (`-f BAMPE` or `-f BEDPE`),
the model file won't be generated. Instead, you can find the average
fragment size in the command line output in the line of: `Average
insertion length of all pairs is`.

## Command Line Options

Here is a brief overview of the `predictd` options:

- `-i` or `--ifile`: ChIP-seq alignment file. If multiple files are
  given as '-t A B C', then they will all be read and
  combined. REQUIRED.
- `-f` or `--format`: Format of the tag file.
  - `AUTO`: MACS3 will pick a format from "AUTO", "BED", "ELAND",
    "ELANDMULTI", "ELANDEXPORT", "SAM", "BAM", "BOWTIE", "BAMPE", and
    "BEDPE". However, if you want to decide the average insertion
    size/fragment size from PE data such as BEDPE or BAMPE, please
    specify the format as BAMPE or BEDPE since MACS3 won't
    automatically recognize these two formats with -f AUTO. Please be
    aware that in PE mode, -g, -s, --bw, --d-min, -m, and --rfile have
    NO effect. DEFAULT: "AUTO" 
- `-g` or `--gsizeE`: Please check [`callpeak`](./callpeak.md) for
  detail. DEFAULT:hs
- `-s` or `--tsize`: Tag size. This will override the auto-detected
  tag size. DEFAULT: Not set 
- `--bw`: Bandwidth for picking regions to compute the fragment
  size. This value is only used while building the shifting
  model. DEFAULT: 300 
- `--d-min`: Minimum fragment size in base pairs. Any predicted
  fragment size less than this will be excluded. DEFAULT: 20 
- `-m` or `--mfoldD`: Select the regions within MFOLD range of
  high-confidence enrichment ratio against background to build the
  model. Fold-enrichment in regions must be lower than the upper limit
  and higher than the lower limit. Use as "-m 10 30". DEFAULT: 5 50  
- `--outdir`: If specified, all output files will be written to that
  directory. Default: the current working directory 
- `--rfile`: PREFIX of the filename of the R script for drawing the
  X-correlation figure. DEFAULT: 'predictd_model.R' and the R file
  will be predicted_model.R 
- `--buffer-size`: Buffer size for incrementally increasing the
  internal array size to store read alignment information. In most
  cases, you don't have to change this parameter. However, if there is
  a large number of chromosomes/contigs/scaffolds in your alignment,
  it's recommended to specify a smaller buffer size in order to
  decrease memory usage (but it will take longer time to read
  alignment files). Minimum memory requested for reading an alignment
  file is about # of CHROMOSOME * BUFFER_SIZE * 8 Bytes. DEFAULT:
  100000 
- `--verbose`: Set the verbose level of runtime messages. 0: only show
  critical messages, 1: show additional warning messages, 2: show
  process information, 3: show debug messages. DEFAULT: 2 

## Example Usage

Here is an example of how to use the `predictd` command:

```bash
macs3 predictd -i input.bedGraph --rfile model.R 
```

Then you can use R to make a figure for the model:

```bash
Rscript model.R
```