MACS: Model-based Analysis for ChIP-Seq
Latest Release:
Introduction
With the advancement of sequencing technologies, Chromatin Immunoprecipitation followed by high-throughput sequencing (ChIP-Seq) has become a popular method for studying genome-wide protein-DNA interactions. With the purpose of addressing the need for a robust ChIP-Seq analysis tool, we introduce Model-based Analysis of ChIP-Seq (MACS), a powerful tool for identifying transcription factor binding sites. MACS accounts for the complexity of the genome to assess the significance of enriched ChIP regions and enhances the spatial resolution of binding sites by integrating both sequencing tag position and orientation. MACS can be readily applied to ChIP-Seq data alone, or in conjunction with a control sample, thus enhancing specificity. Furthermore, as a versatile peak-caller, MACS can be employed in any “DNA enrichment assay” to answer the fundamental question: Where are the regions with significant read coverage compared to random background?
Changes for MACS (3.0.3)
Features added
Now support FRAG format for single-cell ATAC-seq in
callpeak
andpileup
. FRAG format is used by 10x Genomics to store alignments from the single-cell ATAC-seq pipelinecellranger-atac
or the multi-omics pipelinecellranger-arc
. The format is essentially BEDPE with two additional columns: the barcode and the count of fragments aligned to the same location with the same barcode. Support for FRAG in other tools is coming soon, as well as forhmmratac
calls.If you specify FRAG as your input format:
You can use a barcode list for a subset of cells with
--barcodes
, thencallpeak
will identify peaks andpileup
will build pileup track for the fragments of this subset of cells.Duplicates will not get removed as we’ll assume all fragments are valid. Optionally, an option,
--max-count
, can be applied to set the maximum count.
We transitioned our
pyx
codes topy
codes, adopting a ‘pure Python style’ with PEP-484 type annotations. This change has made our source codes more compatible with Python programming tools such asflake8
. During this process, we performed further code cleaning and eliminated unnecessary dependencies. We intend to continue improving our code quality in the future.We have modified the handling of ‘blacklist’ regions in the
hmmratac
tool. This change impacts both the Expectation-Maximization (EM) step that estimates fragment length distributions, and the Hidden Markov Model (HMM) step that learns and predicts nucleosome states. We now exclude aligned fragments located in the ‘blocklist’ regions before both steps. We implemented theexclude
functions in both PETrackI and PETrackII to support this feature. For more detailed information and the reasoning behind it, refer to issue #680.We have tested Numpy>=2. Now MACS3 can be run on Numpy version 1 and version 2.
Bug fixed
The
hmmratagc
option--keep-duplicate
previously had the opposite effect of what its name and description suggested. Therefore, it was renamed to--remove-dup
to more accurately describe the actual behavior. Duplicate fragments will not be removed byhmmratac
unless this option is explicitly set up.hmmratac
: wrong class name was used while saving digested signals in BedGraph files. Fixed multiple other issues related to output filenames. #682Fix issues in big-endian system in
Parser.py
codes. Enable big-endian support inBAM.py
codes for accessig certain alignment records that overlap with given genomic coordinates using BAM/BAI files.predictd
andfilterdup
: wrong variable name used while reading multiple pe/frag files.
Doc
Explanation on the filtering criteria on SAM/BAM/BAMPE files.
Install
The common way to install MACS is through PYPI) or conda. Please check the INSTALL document for detail.
MACS3 has been tested using GitHub Actions for every push and PR in the following architectures:
x86_64 (Ubuntu 22, Python 3.9, 3.10, 3.11, 3.12, 3.13)
aarch64 (Ubuntu 22, Python 3.10)
armv7 (Ubuntu 22, Python 3.10)
ppc64le (Ubuntu 22, Python 3.10)
s390x (Ubuntu 22, Python 3.10)
Apple chips (Mac OS 13, Python 3.9, 3.10, 3.11, 3.12, 3.13)
In general, you can install through PyPI as pip install macs3
. To
use virtual environment is highly recommended. Or you can install
after unzipping the released package downloaded from Github, then use
pip install .
command. Please note that, we haven’t tested
installation on any Windows OS, so currently only Linux and Mac OS
systems are supported. Also, for aarch64, armv7, ppc64le and s390x,
due to some unknown reason potentially related to the scientific
calculation libraries MACS3 depends on, such as Numpy, Scipy,
hmm-learn, scikit-learn, the results from hmmratac
subcommand may
not be consistent with the results from x86 or Apple chips. Please be
aware.
Usage
Example for regular peak calling on TF ChIP-seq:
macs3 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01
Example for broad peak calling on Histone Mark ChIP-seq:
macs3 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1
Example for peak calling on ATAC-seq (paired-end mode):
macs3 callpeak -f BAMPE -t ATAC.bam -g hs -n test -B -q 0.01
Example for peak calling on ATAC-seq with HMMATAC:
macs3 hmmratac -i ATAC.bam -f BAMPE -n test
There are currently 14 functions available in MACS3 serving as sub-commands. Please click on the link to see the detail description of the subcommands.
Subcommand |
Description |
---|---|
Main MACS3 Function to call peaks from alignment results. |
|
Call peaks from bedGraph file. |
|
Call nested broad peaks from bedGraph file. |
|
Comparing two signal tracks in bedGraph format. |
|
Operate the score column of bedGraph file. |
|
Combine bedGraph files of scores from replicates. |
|
Differential peak detection based on paired four bedGraph files. |
|
Remove duplicate reads, then save in BED/BEDPE format file. |
|
Predict d or fragment size from alignment results. In case of PE data, report the average insertion/fragment size from all pairs. |
|
Pileup aligned reads (single-end) or fragments (paired-end) |
|
Randomly choose a number/percentage of total reads, then save in BED/BEDPE format file. |
|
Take raw reads alignment, refine peak summits. |
|
Call variants in given peak regions from the alignment BAM files. |
|
Dedicated peak calling based on Hidden Markov Model for ATAC-seq data. |
For advanced usage, for example, to run macs3
in a modular way,
please read the advanced usage. There is a
Q&A document where we collected some common questions
from users.
Contribute
Please read our CODE OF CONDUCT and How to contribute documents. If you have any questions, suggestion/ideas, or just want to have conversions with developers and other users in the community, we recommend using the MACS Discussions instead of posting to our Issues page.
Ackowledgement
MACS3 project is sponsored by . And we particularly want to thank the user community for their supports, feedbacks and contributions over the years.