MACS: Model-based Analysis for ChIP-Seq

Status License Programming languages

Latest Release:

Github:
PyPI:
Bioconda:
Debian Med:

Introduction

With the advancement of sequencing technologies, Chromatin Immunoprecipitation followed by high-throughput sequencing (ChIP-Seq) has become a popular method for studying genome-wide protein-DNA interactions. With the purpose of addressing the need for a robust ChIP-Seq analysis tool, we introduce Model-based Analysis of ChIP-Seq (MACS), a powerful tool for identifying transcription factor binding sites. MACS accounts for the complexity of the genome to assess the significance of enriched ChIP regions and enhances the spatial resolution of binding sites by integrating both sequencing tag position and orientation. MACS can be readily applied to ChIP-Seq data alone, or in conjunction with a control sample, thus enhancing specificity. Furthermore, as a versatile peak-caller, MACS can be employed in any “DNA enrichment assay” to answer the fundamental question: Where are the regions with significant read coverage compared to random background?

Changes for MACS (3.0.4)

Features added

hmmratac now supports -f FRAG, enabling direct processing of single-cell fragment files.

Note: because FRAG uses a different internal subsampling implementation from BEDPE/BAMPE, results can differ slightly between collapsed FRAG input and standard paired-end inputs.

Added --barcodes and --max-count to hmmratac so peak calling can be restricted to a barcode subset and/or capped by fragment count.

For example, to call accessible regions for one cell type in an scATAC-seq dataset: --barcodes celltype1_barcodes.txt --max-count 2.

Bugs fixed

PETrackII.pileup_bdg now wraps pileup outputs in array.array before passing them to bedGraphTrackI, preventing runtime type errors when writing bedGraph tracks from FRAG inputs.
PETrackII.sample_percent* now correctly allows zero-percent downsampling, preserving expected CLI behavior when the balance target contains no fragments.
Fixed an overflow path in PETrackI when updating total track length by explicitly casting int32 values to ulonglong.
Fixed a pvalue_stat issue that could yield nan when a region with the same p-value exceeded 2e9 bp (int32 limit).

Documentation

Cleaned up source-code docstrings and expanded API docs for key Signal and IO modules.
Added a Jupyter notebook demo showing single-cell ATAC-seq processing with the MACS3 API.

Install

The common way to install MACS is through PYPI) or conda. Please check the INSTALL document for detail.

MACS3 has been tested using GitHub Actions for every push and PR in the following architectures:

x86_64 (Ubuntu 22, Python 3.9, 3.10, 3.11, 3.12, 3.13)
aarch64 (Ubuntu 22, Python 3.10)
armv7 (Ubuntu 22, Python 3.10)
ppc64le (Ubuntu 22, Python 3.10)
s390x (Ubuntu 22, Python 3.10)
Apple chips (Mac OS 13, Python 3.9, 3.10, 3.11, 3.12, 3.13)

In general, you can install through PyPI as pip install macs3. To use virtual environment is highly recommended. Or you can install after unzipping the released package downloaded from Github, then use pip install . command. Please note that, we haven’t tested installation on any Windows OS, so currently only Linux and Mac OS systems are supported. Also, for aarch64, armv7, ppc64le and s390x, due to some unknown reason potentially related to the scientific calculation libraries MACS3 depends on, such as Numpy, Scipy, hmm-learn, scikit-learn, the results from hmmratac subcommand may not be consistent with the results from x86 or Apple chips. Please be aware.

Usage

Example for regular peak calling on TF ChIP-seq:

macs3 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01

Example for broad peak calling on Histone Mark ChIP-seq:

macs3 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1

Example for peak calling on ATAC-seq (paired-end mode):

macs3 callpeak -f BAMPE -t ATAC.bam -g hs -n test -B -q 0.01

Example for peak calling on ATAC-seq with HMMATAC:

macs3 hmmratac -i ATAC.bam -f BAMPE -n test

There are currently 14 functions available in MACS3 serving as sub-commands. Please click on the link to see the detail description of the subcommands.

Subcommand	Description
`callpeak`	Main MACS3 Function to call peaks from alignment results.
`bdgpeakcall`	Call peaks from bedGraph file.
`bdgbroadcall`	Call nested broad peaks from bedGraph file.
`bdgcmp`	Comparing two signal tracks in bedGraph format.
`bdgopt`	Operate the score column of bedGraph file.
`cmbreps`	Combine bedGraph files of scores from replicates.
`bdgdiff`	Differential peak detection based on paired four bedGraph files.
`filterdup`	Remove duplicate reads, then save in BED/BEDPE format file.
`predictd`	Predict d or fragment size from alignment results. In case of PE data, report the average insertion/fragment size from all pairs.
`pileup`	Pileup aligned reads (single-end) or fragments (paired-end)
`randsample`	Randomly choose a number/percentage of total reads, then save in BED/BEDPE format file.
`refinepeak`	Take raw reads alignment, refine peak summits.
`callvar`	Call variants in given peak regions from the alignment BAM files.
`hmmratac`	Dedicated peak calling based on Hidden Markov Model for ATAC-seq or scATAC-seq data.

For advanced usage, for example, to run macs3 in a modular way, please read the advanced usage. There is a Q&A document where we collected some common questions from users.

Contribute

Please read our CODE OF CONDUCT and How to contribute documents. If you have any questions, suggestion/ideas, or just want to have conversions with developers and other users in the community, we recommend using the MACS Discussions instead of posting to our Issues page.

Ackowledgement

MACS3 project is sponsored by . And we particularly want to thank the user community for their supports, feedbacks and contributions over the years.

Citation

2008: Model-based Analysis of ChIP-Seq (MACS)