MACS3.IO.PeakIO module

Module for PeakIO IO classes.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).

class MACS3.IO.PeakIO.BroadPeakContent

Bases: object

Container for broad peak metadata used in broadPeak format.

class MACS3.IO.PeakIO.BroadPeakIO

Bases: object

IO for broad peak information.

peaks

Mapping of chromosome name to BroadPeakContent list.

add(chromosome, start, end, score=0.0, thickStart=b'.', thickEnd=b'.', blockNum=0, blockSizes=b'.', blockStarts=b'.', pileup=0.0, pscore=0.0, fold_change=0.0, qscore=0.0, name=b'NA')

Append a BroadPeakContent record.

Parameters:
  • chromosome (bytes) – Chromosome name for the region.

  • start (cython.int) – 0-based inclusive start coordinate.

  • end (cython.int) – 0-based exclusive end coordinate.

  • score (cython.float) – Average score across blocks.

  • thickStart (bytes) – Start of the high-enrichment segment or b'.'.

  • thickEnd (bytes) – End of the high-enrichment segment or b'.'.

  • blockNum (cython.int) – Number of sub-blocks composing the region.

  • blockSizes (bytes) – Comma-separated block sizes as bytes.

  • blockStarts (bytes) – Comma-separated block starts as bytes.

  • pileup (cython.float) – Median pileup within the region.

  • pscore (cython.float) – Median -log10(pvalue).

  • fold_change (cython.float) – Median fold-change value.

  • qscore (cython.float) – Median -log10(qvalue).

  • name (bytes) – Optional region identifier.

Examples

from MACS3.IO.PeakIO import BroadPeakIO
peaks = BroadPeakIO()
peaks.add(b"chr1", 100, 500, score=10.0, blockNum=1,
          blockSizes=b"400", blockStarts=b"0",...)
filter_fc(fc_low, fc_up=-1.0)

Filter broad peaks by fold-change range.

Parameters:
  • fc_low (float) – Inclusive lower bound on fold change.

  • fc_up (float) – Exclusive upper bound; ignored when negative.

filter_pscore(pscore_cut)

Retain broad peaks with -log10(pvalue)pscore_cut.

Parameters:

pscore_cut (cython.float) – Inclusive lower bound for -log10(pvalue).

filter_qscore(qscore_cut)

Retain broad peaks with -log10(qvalue)qscore_cut.

Parameters:

qscore_cut (cython.float) – Inclusive lower bound for -log10(qvalue).

peaks
total()

Return the total number of broad peaks currently stored.

Returns:

Number of broad peaks.

Return type:

int

write_to_Bed12(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)

Write broad peaks in BED12 format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for the optional header.

  • score_column (str) – Peak attribute mapped to the score column.

  • trackline (bool) – Whether to emit a UCSC track header.

Examples

with open("broad.bed12", "w") as f:
    peaks.write_to_Bed12(f)
write_to_broadPeak(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)

Write broad peaks in the ENCODE broadPeak (BED6+3) format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for the optional header.

  • score_column (str) – Peak attribute mapped to the score column.

  • trackline (bool) – Whether to emit a UCSC track header.

Examples

with open("broad.broadPeak", "w") as f:
    peaks.write_to_broadPeak(f)
write_to_gappedPeak(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)

Write broad peaks in gappedPeak (BED12+3) format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for the optional header.

  • score_column (str) – Peak attribute mapped to the score column.

  • trackline (bool) – Whether to emit a UCSC track header.

Examples

with open("broad.gappedPeak", "w") as f:
    peaks.write_to_gappedPeak(f)
write_to_xls(ofhd, name_prefix=b'%s_peak_', name=b'MACS')

Export broad peaks to a tab-delimited .xls text file.

Parameters:
  • ofhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

Examples

with open("broad.xls", "w") as f:
    peaks.write_to_xls(f)
class MACS3.IO.PeakIO.PeakContent

Bases: object

Represent a narrow peak and its derived statistics.

chrom

Chromosome name (bytes).

start

0-based inclusive start coordinate.

end

0-based exclusive end coordinate.

length

Peak length in base pairs.

summit

0-based summit position.

score

Peak score reported by MACS3.

pileup

Tag pileup at the summit.

pscore

-log10(pvalue) at the summit.

fc

Fold enrichment at the summit.

qscore

-log10(qvalue) at the summit.

name

Optional peak identifier.

class MACS3.IO.PeakIO.PeakIO

Bases: object

Manage in-memory collections of narrow peak intervals.

peaks

Dictionary storing peak contents.

CO_sorted

Whether peaks have been coordinate-sorted.

total

Total number of peaks.

name

Collection name used in output.

CO_sorted
add(chromosome, start, end, summit=0, peak_score=0.0, pileup=0.0, pscore=0.0, fold_change=0.0, qscore=0.0, name=b'')

Add a peak described by raw coordinates and scores.

Parameters:
  • chromosome (bytes) – Chromosome name for the peak.

  • start (cython.int) – 0-based inclusive start coordinate.

  • end (cython.int) – 0-based exclusive end coordinate.

  • summit (cython.int) – 0-based summit position.

  • peak_score (cython.float) – Reported peak score.

  • pileup (cython.float) – Tag pileup at the summit.

  • pscore (cython.float) – -log10(pvalue) score.

  • fold_change (cython.float) – Fold enrichment relative to control.

  • qscore (cython.float) – -log10(qvalue) score.

  • name (bytes) – Optional peak identifier.

Examples

from MACS3.IO.PeakIO import PeakIO
peaks = PeakIO()
peaks.add(b"chr1", 100, 200, summit=150, peak_score=10.0)
add_PeakContent(chromosome, peakcontent)

Extend the collection with an existing PeakContent.

Parameters:
  • chromosome (bytes) – Chromosome name under which to store the peak.

  • peakcontent (PeakContent) – Peak record to append.

Examples

from MACS3.IO.PeakIO import PeakIO, PeakContent
peaks = PeakIO()
peak = PeakContent(b"chr1", 100, 200, 150, 10.0, 5.0, 3.0, 2.0, 1.0)
peaks.add_PeakContent(b"chr1", peak)
exclude(peaksio2)

Remove peaks that overlap any entry in peaksio2.

Parameters:

peaksio2 (object) – Another PeakIO instance providing exclusion regions.

filter_fc(fc_low, fc_up=0.0)

Filter peaks by fold-change range.

Parameters:
  • fc_low (cython.float) – Inclusive lower bound on fold change.

  • fc_up (cython.float) – Exclusive upper bound; ignored if <= 0.

filter_pscore(pscore_cut)

Filter peaks by minimum -log10(pvalue).

Parameters:

pscore_cut (cython.double) – Lower bound (inclusive) for -log10(pvalue).

filter_qscore(qscore_cut)

Filter peaks by minimum -log10(qvalue).

Parameters:

qscore_cut (cython.double) – Lower bound (inclusive) for -log10(qvalue).

filter_score(lower_score, upper_score=0.0)

Filter peaks by their primary score range.

Parameters:
  • lower_score (cython.float) – Inclusive lower bound on score.

  • upper_score (cython.float) – Exclusive upper bound; if <= 0 the bound is ignored.

get_chr_names()

Return the chromosome names represented in the collection.

Returns:

Unique chromosome names.

Return type:

set

Examples

peaks = PeakIO()
names = peaks.get_chr_names()
get_data_from_chrom(chrom)

Return peaks for chrom, initialising storage if needed.

Parameters:

chrom (bytes) – Chromosome name to query.

Returns:

Peaks associated with chrom.

Return type:

list

Examples

peaks = PeakIO()
chrom_peaks = peaks.get_data_from_chrom(b"chr1")
name
peaks
randomly_pick(n, seed=12345)

Return a new PeakIO containing n randomly sampled peaks.

Parameters:
  • n (cython.int) – Number of peaks to sample.

  • seed (cython.int) – RNG seed to ensure reproducibility.

Returns:

Fresh instance populated with sampled peaks.

Return type:

PeakIO

Examples

sampled = peaks.randomly_pick(100, seed=42)
read_from_xls(ofhd)

Load peak records from a MACS3 .xls tab-delimited report.

Parameters:

ofhd – Readable file-like object positioned at the beginning of the report.

sort()

Sort peaks on each chromosome by ascending start position.

to_summits_bed()

Write peak summits in BED5 format to stdout.

Each summit is emitted as a one-base interval with the selected score column.

tobed()

Write peaks in BED5 format to stdout.

The five columns correspond to chromosome, start, end, name, and the attribute selected by score_column.

total
write_to_bed(fhd, name_prefix=b'%s_peak_', name=b'MACS', description=b'%s', score_column='score', trackline=True)

Write peaks to a file handle in BED5 format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build peak names.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for optional header line.

  • score_column (str) – Peak attribute to emit as the score field.

  • trackline (bool) – Whether to emit a UCSC track header line.

Examples

with open("peaks.bed", "w") as f:
    peaks.write_to_bed(f)
write_to_narrowPeak(fhd, name_prefix=b'%s_peak_', name=b'MACS', score_column='score', trackline=False)

Write peaks in the ENCODE narrowPeak (BED6+4) format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • score_column (str) – Peak attribute mapped to the narrowPeak score field.

  • trackline (bool) – Whether to emit a UCSC track header.

Examples

with open("peaks.narrowPeak", "w") as f:
    peaks.write_to_narrowPeak(f)
write_to_summit_bed(fhd, name_prefix=b'%s_peak_', name=b'MACS', description=b'%s', score_column='score', trackline=False)

Write peak summits to a file handle in BED5 format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build summit names.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for optional header line.

  • score_column (str) – Peak attribute to emit as the score field.

  • trackline (bool) – Whether to emit a UCSC track header line.

Examples

with open("summits.bed", "w") as f:
    peaks.write_to_summit_bed(f)
write_to_xls(ofhd, name_prefix=b'%s_peak_', name=b'MACS')

Export narrow peaks to a tab-delimited .xls text file.

Parameters:
  • ofhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

Examples

with open("peaks.xls", "w") as f:
    peaks.write_to_xls(f)
class MACS3.IO.PeakIO.RegionIO

Bases: object

Helper for storing and manipulating simple genomic regions.

regions

Mapping of chromosome name to a list of (start, end) tuples.

__flag_sorted

Whether per-chromosome regions are sorted.

add_loc(chrom, start, end)

Append a new (start, end) interval for chrom.

Examples

regions = RegionIO()
regions.add_loc(b"chr1", 100, 200)
get_chr_names()

Return chromosome names present in the region set.

merge_overlap()

Merge overlapping intervals within each chromosome.

sort()

Sort regions for each chromosome by their start coordinate.

write_to_bed(fhd)

Emit regions in BED format to the provided file-like object.

Examples

with open("regions.bed", "w") as f:
    regions.write_to_bed(f)