MACS3.IO.PeakIO module
Module for PeakIO IO classes.
This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).
- class MACS3.IO.PeakIO.BroadPeakContent(start, end, score, thickStart, thickEnd, blockNum, blockSizes, blockStarts, pileup, pscore, fold_change, qscore, name=b'MACS3')
Bases:
objectContainer for broad peak metadata used in broadPeak format.
- blockNum: int
- blockSizes: bytes
- blockStarts: bytes
- end: int
- fc: float
- length: int
- name: bytes
- pileup: float
- pscore: float
- qscore: float
- score: float
- start: int
- thickEnd: bytes
- thickStart: bytes
- class MACS3.IO.PeakIO.BroadPeakIO
Bases:
objectIO for broad peak information.
- peaks
Mapping of chromosome name to
BroadPeakContentlist.
- add(chromosome, start, end, score=0.0, thickStart=b'.', thickEnd=b'.', blockNum=0, blockSizes=b'.', blockStarts=b'.', pileup=0, pscore=0, fold_change=0, qscore=0, name=b'NA')
Append a
BroadPeakContentrecord.- Parameters:
chromosome (
bytes) – Chromosome name for the region.start (
typedef) – 0-based inclusive start coordinate.end (
typedef) – 0-based exclusive end coordinate.score (
typedef) – Average score across blocks.thickStart (
bytes) – Start of the high-enrichment segment orb'.'.thickEnd (
bytes) – End of the high-enrichment segment orb'.'.blockNum (
typedef) – Number of sub-blocks composing the region.blockSizes (
bytes) – Comma-separated block sizes as bytes.blockStarts (
bytes) – Comma-separated block starts as bytes.pileup (
typedef) – Median pileup within the region.pscore (
typedef) – Median-log10(pvalue).fold_change (
typedef) – Median fold-change value.qscore (
typedef) – Median-log10(qvalue).name (
bytes) – Optional region identifier.
Examples
from MACS3.IO.PeakIO import BroadPeakIO peaks = BroadPeakIO() peaks.add(b"chr1", 100, 500, score=10.0, blockNum=1, blockSizes=b"400", blockStarts=b"0",...)
- filter_fc(fc_low, fc_up=-1)
Filter broad peaks by fold-change range.
- Parameters:
fc_low (
float) – Inclusive lower bound on fold change.fc_up (
float) – Exclusive upper bound; ignored when negative.
- filter_pscore(pscore_cut)
Retain broad peaks with
-log10(pvalue)≥pscore_cut.- Parameters:
pscore_cut (
typedef) – Inclusive lower bound for-log10(pvalue).
- filter_qscore(qscore_cut)
Retain broad peaks with
-log10(qvalue)≥qscore_cut.- Parameters:
qscore_cut (
typedef) – Inclusive lower bound for-log10(qvalue).
- peaks = None
- total()
Return the total number of broad peaks currently stored.
- Returns:
Number of broad peaks.
- Return type:
int
- write_to_Bed12(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)
Write broad peaks in BED12 format.
- Parameters:
fhd – Writable file-like object.
name_prefix (
bytes) – Template used to construct peak identifiers.name (
bytes) – Dataset label interpolated intoname_prefix.description (
bytes) – Track description for the optional header.score_column (
str) – Peak attribute mapped to the score column.trackline (
_fake_callable) – Whether to emit a UCSCtrackheader.
Examples
with open("broad.bed12", "w") as f: peaks.write_to_Bed12(f)
- write_to_broadPeak(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)
Write broad peaks in the ENCODE broadPeak (BED6+3) format.
- Parameters:
fhd – Writable file-like object.
name_prefix (
bytes) – Template used to construct peak identifiers.name (
bytes) – Dataset label interpolated intoname_prefix.description (
bytes) – Track description for the optional header.score_column (
str) – Peak attribute mapped to the score column.trackline (
_fake_callable) – Whether to emit a UCSCtrackheader.
Examples
with open("broad.broadPeak", "w") as f: peaks.write_to_broadPeak(f)
- write_to_gappedPeak(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)
Write broad peaks in gappedPeak (BED12+3) format.
- Parameters:
fhd – Writable file-like object.
name_prefix (
bytes) – Template used to construct peak identifiers.name (
bytes) – Dataset label interpolated intoname_prefix.description (
bytes) – Track description for the optional header.score_column (
str) – Peak attribute mapped to the score column.trackline (
_fake_callable) – Whether to emit a UCSCtrackheader.
Examples
with open("broad.gappedPeak", "w") as f: peaks.write_to_gappedPeak(f)
- write_to_xls(ofhd, name_prefix=b'%s_peak_', name=b'MACS')
Export broad peaks to a tab-delimited
.xlstext file.- Parameters:
ofhd – Writable file-like object.
name_prefix (
bytes) – Template used to build peak identifiers.name (
bytes) – Dataset label interpolated intoname_prefix.
Examples
with open("broad.xls", "w") as f: peaks.write_to_xls(f)
- class MACS3.IO.PeakIO.PeakContent(chrom, start, end, summit, peak_score, pileup, pscore, fold_change, qscore, name=b'')
Bases:
objectRepresent a narrow peak and its derived statistics.
- chrom
Chromosome name (bytes).
- start
0-based inclusive start coordinate.
- end
0-based exclusive end coordinate.
- length
Peak length in base pairs.
- summit
0-based summit position.
- score
Peak score reported by MACS3.
- pileup
Tag pileup at the summit.
- pscore
-log10(pvalue)at the summit.
- fc
Fold enrichment at the summit.
- qscore
-log10(qvalue)at the summit.
- name
Optional peak identifier.
- chrom: bytes
- end: int
- fc: float
- length: int
- name: bytes
- pileup: float
- pscore: float
- qscore: float
- score: float
- start: int
- summit: int
- class MACS3.IO.PeakIO.PeakIO(name=b'MACS3')
Bases:
objectManage in-memory collections of narrow peak intervals.
- peaks
Dictionary storing peak contents.
- CO_sorted
Whether peaks have been coordinate-sorted.
- total
Total number of peaks.
- name
Collection name used in output.
- CO_sorted = None
- add(chromosome, start, end, summit=0, peak_score=0, pileup=0, pscore=0, fold_change=0, qscore=0, name=b'')
Add a peak described by raw coordinates and scores.
- Parameters:
chromosome (
bytes) – Chromosome name for the peak.start (
typedef) – 0-based inclusive start coordinate.end (
typedef) – 0-based exclusive end coordinate.summit (
typedef) – 0-based summit position.peak_score (
typedef) – Reported peak score.pileup (
typedef) – Tag pileup at the summit.pscore (
typedef) –-log10(pvalue)score.fold_change (
typedef) – Fold enrichment relative to control.qscore (
typedef) –-log10(qvalue)score.name (
bytes) – Optional peak identifier.
Examples
from MACS3.IO.PeakIO import PeakIO peaks = PeakIO() peaks.add(b"chr1", 100, 200, summit=150, peak_score=10.0)
- add_PeakContent(chromosome, peakcontent)
Extend the collection with an existing
PeakContent.- Parameters:
chromosome (
bytes) – Chromosome name under which to store the peak.peakcontent (
PeakContent) – Peak record to append.
Examples
from MACS3.IO.PeakIO import PeakIO, PeakContent peaks = PeakIO() peak = PeakContent(b"chr1", 100, 200, 150, 10.0, 5.0, 3.0, 2.0, 1.0) peaks.add_PeakContent(b"chr1", peak)
- exclude(peaksio2)
Remove peaks that overlap any entry in
peaksio2.- Parameters:
peaksio2 (
object) – AnotherPeakIOinstance providing exclusion regions.
- filter_fc(fc_low, fc_up=0)
Filter peaks by fold-change range.
- Parameters:
fc_low (
typedef) – Inclusive lower bound on fold change.fc_up (
typedef) – Exclusive upper bound; ignored if<= 0.
- filter_pscore(pscore_cut)
Filter peaks by minimum
-log10(pvalue).- Parameters:
pscore_cut (
typedef) – Lower bound (inclusive) for-log10(pvalue).
- filter_qscore(qscore_cut)
Filter peaks by minimum
-log10(qvalue).- Parameters:
qscore_cut (
typedef) – Lower bound (inclusive) for-log10(qvalue).
- filter_score(lower_score, upper_score=0)
Filter peaks by their primary score range.
- Parameters:
lower_score (
typedef) – Inclusive lower bound on score.upper_score (
typedef) – Exclusive upper bound; if<= 0the bound is ignored.
- get_chr_names()
Return the chromosome names represented in the collection.
- Returns:
Unique chromosome names.
- Return type:
set
Examples
peaks = PeakIO() names = peaks.get_chr_names()
- get_data_from_chrom(chrom)
Return peaks for
chrom, initialising storage if needed.- Parameters:
chrom (
bytes) – Chromosome name to query.- Returns:
Peaks associated with
chrom.- Return type:
list
Examples
peaks = PeakIO() chrom_peaks = peaks.get_data_from_chrom(b"chr1")
- name = None
- peaks = None
- randomly_pick(n, seed=12345)
Return a new
PeakIOcontainingnrandomly sampled peaks.- Parameters:
n (
typedef) – Number of peaks to sample.seed (
typedef) – RNG seed to ensure reproducibility.
- Returns:
Fresh instance populated with sampled peaks.
- Return type:
Examples
sampled = peaks.randomly_pick(100, seed=42)
- read_from_xls(ofhd)
Load peak records from a MACS3
.xlstab-delimited report.- Parameters:
ofhd – Readable file-like object positioned at the beginning of the report.
- sort()
Sort peaks on each chromosome by ascending start position.
- to_summits_bed()
Write peak summits in BED5 format to
stdout.Each summit is emitted as a one-base interval with the selected score column.
- tobed()
Write peaks in BED5 format to
stdout.The five columns correspond to chromosome, start, end, name, and the attribute selected by
score_column.
- total = None
- write_to_bed(fhd, name_prefix=b'%s_peak_', name=b'MACS', description=b'%s', score_column='score', trackline=True)
Write peaks to a file handle in BED5 format.
- Parameters:
fhd – Writable file-like object.
name_prefix (
bytes) – Template used to build peak names.name (
bytes) – Dataset label interpolated intoname_prefix.description (
bytes) – Track description for optional header line.score_column (
str) – Peak attribute to emit as the score field.trackline (
_fake_callable) – Whether to emit a UCSCtrackheader line.
Examples
with open("peaks.bed", "w") as f: peaks.write_to_bed(f)
- write_to_narrowPeak(fhd, name_prefix=b'%s_peak_', name=b'MACS', score_column='score', trackline=False)
Write peaks in the ENCODE narrowPeak (BED6+4) format.
- Parameters:
fhd – Writable file-like object.
name_prefix (
bytes) – Template used to construct peak identifiers.name (
bytes) – Dataset label interpolated intoname_prefix.score_column (
str) – Peak attribute mapped to the narrowPeak score field.trackline (
_fake_callable) – Whether to emit a UCSCtrackheader.
Examples
with open("peaks.narrowPeak", "w") as f: peaks.write_to_narrowPeak(f)
- write_to_summit_bed(fhd, name_prefix=b'%s_peak_', name=b'MACS', description=b'%s', score_column='score', trackline=False)
Write peak summits to a file handle in BED5 format.
- Parameters:
fhd – Writable file-like object.
name_prefix (
bytes) – Template used to build summit names.name (
bytes) – Dataset label interpolated intoname_prefix.description (
bytes) – Track description for optional header line.score_column (
str) – Peak attribute to emit as the score field.trackline (
_fake_callable) – Whether to emit a UCSCtrackheader line.
Examples
with open("summits.bed", "w") as f: peaks.write_to_summit_bed(f)
- write_to_xls(ofhd, name_prefix=b'%s_peak_', name=b'MACS')
Export narrow peaks to a tab-delimited
.xlstext file.- Parameters:
ofhd – Writable file-like object.
name_prefix (
bytes) – Template used to build peak identifiers.name (
bytes) – Dataset label interpolated intoname_prefix.
Examples
with open("peaks.xls", "w") as f: peaks.write_to_xls(f)
- class MACS3.IO.PeakIO.RegionIO
Bases:
objectHelper for storing and manipulating simple genomic regions.
- regions
Mapping of chromosome name to a list of
(start, end)tuples.
- __flag_sorted
Whether per-chromosome regions are sorted.
- add_loc(chrom, start, end)
Append a new
(start, end)interval forchrom.Examples
regions = RegionIO() regions.add_loc(b"chr1", 100, 200)
- get_chr_names()
Return chromosome names present in the region set.
- Return type:
set
- merge_overlap()
Merge overlapping intervals within each chromosome.
- regions: dict
- sort()
Sort regions for each chromosome by their start coordinate.
- write_to_bed(fhd)
Emit regions in BED format to the provided file-like object.
Examples
with open("regions.bed", "w") as f: regions.write_to_bed(f)
- MACS3.IO.PeakIO.subpeak_letters(i)
Return the alphabetical label for a zero-based subpeak index.
- Parameters:
i (
typedef) – Zero-based subpeak index.- Returns:
Alphabetical label sequence (
a,b, …,aa).- Return type:
str