MACS3.IO.PeakIO module

Module for PeakIO IO classes.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).

class MACS3.IO.PeakIO.BroadPeakContent(start, end, score, thickStart, thickEnd, blockNum, blockSizes, blockStarts, pileup, pscore, fold_change, qscore, name=b'MACS3')

Bases: object

Container for broad peak metadata used in broadPeak format.

blockNum: typedef
blockSizes: bytes
blockStarts: bytes
end: typedef
fc: typedef
length: typedef
name: bytes
pileup: typedef
pscore: typedef
qscore: typedef
score: typedef
start: typedef
thickEnd: bytes
thickStart: bytes
class MACS3.IO.PeakIO.BroadPeakIO

Bases: object

IO for broad peak information.

add(chromosome, start, end, score=0.0, thickStart=b'.', thickEnd=b'.', blockNum=0, blockSizes=b'.', blockStarts=b'.', pileup=0, pscore=0, fold_change=0, qscore=0, name=b'NA')

Append a BroadPeakContent record.

Parameters:
  • chromosome (bytes) – Chromosome name for the region.

  • start (typedef) – 0-based inclusive start coordinate.

  • end (typedef) – 0-based exclusive end coordinate.

  • score (typedef) – Average score across blocks.

  • thickStart (bytes) – Start of the high-enrichment segment or b'.'.

  • thickEnd (bytes) – End of the high-enrichment segment or b'.'.

  • blockNum (typedef) – Number of sub-blocks composing the region.

  • blockSizes (bytes) – Comma-separated block sizes as bytes.

  • blockStarts (bytes) – Comma-separated block starts as bytes.

  • pileup (typedef) – Median pileup within the region.

  • pscore (typedef) – Median -log10(pvalue).

  • fold_change (typedef) – Median fold-change value.

  • qscore (typedef) – Median -log10(qvalue).

  • name (bytes) – Optional region identifier.

filter_fc(fc_low, fc_up=-1)

Filter broad peaks by fold-change range.

Parameters:
  • fc_low (float) – Inclusive lower bound on fold change.

  • fc_up (float) – Exclusive upper bound; ignored when negative.

filter_pscore(pscore_cut)

Retain broad peaks with -log10(pvalue)pscore_cut.

filter_qscore(qscore_cut)

Retain broad peaks with -log10(qvalue)qscore_cut.

peaks = None
total()

Return the total number of broad peaks currently stored.

write_to_Bed12(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)

Write broad peaks in BED12 format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for the optional header.

  • score_column (str) – Peak attribute mapped to the score column.

  • trackline (_fake_callable) – Whether to emit a UCSC track header.

write_to_broadPeak(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)

Write broad peaks in the ENCODE broadPeak (BED6+3) format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for the optional header.

  • score_column (str) – Peak attribute mapped to the score column.

  • trackline (_fake_callable) – Whether to emit a UCSC track header.

write_to_gappedPeak(fhd, name_prefix=b'peak_', name=b'peak', description=b'%s', score_column='score', trackline=True)

Write broad peaks in gappedPeak (BED12+3) format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for the optional header.

  • score_column (str) – Peak attribute mapped to the score column.

  • trackline (_fake_callable) – Whether to emit a UCSC track header.

write_to_xls(ofhd, name_prefix=b'%s_peak_', name=b'MACS')

Export broad peaks to a tab-delimited .xls text file.

Parameters:
  • ofhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

class MACS3.IO.PeakIO.PeakContent(chrom, start, end, summit, peak_score, pileup, pscore, fold_change, qscore, name=b'')

Bases: object

Represent a narrow peak and its derived statistics.

chrom: bytes
end: typedef
fc: typedef
length: typedef
name: bytes
pileup: typedef
pscore: typedef
qscore: typedef
score: typedef
start: typedef
summit: typedef
class MACS3.IO.PeakIO.PeakIO(name=b'MACS3')

Bases: object

Manage in-memory collections of narrow peak intervals.

CO_sorted = None
add(chromosome, start, end, summit=0, peak_score=0, pileup=0, pscore=0, fold_change=0, qscore=0, name=b'')

Add a peak described by raw coordinates and scores.

Parameters:
  • chromosome (bytes) – Chromosome name for the peak.

  • start (typedef) – 0-based inclusive start coordinate.

  • end (typedef) – 0-based exclusive end coordinate.

  • summit (typedef) – 0-based summit position.

  • peak_score (typedef) – Reported peak score.

  • pileup (typedef) – Tag pileup at the summit.

  • pscore (typedef) – -log10(pvalue) score.

  • fold_change (typedef) – Fold enrichment relative to control.

  • qscore (typedef) – -log10(qvalue) score.

  • name (bytes) – Optional peak identifier.

add_PeakContent(chromosome, peakcontent)

Extend the collection with an existing PeakContent.

Parameters:
  • chromosome (bytes) – Chromosome name under which to store the peak.

  • peakcontent (PeakContent) – Peak record to append.

exclude(peaksio2)

Remove peaks that overlap any entry in peaksio2.

Parameters:

peaksio2 (object) – Another PeakIO instance providing exclusion regions.

filter_fc(fc_low, fc_up=0)

Filter peaks by fold-change range.

Parameters:
  • fc_low (typedef) – Inclusive lower bound on fold change.

  • fc_up (typedef) – Exclusive upper bound; ignored if <= 0.

filter_pscore(pscore_cut)

Filter peaks by minimum -log10(pvalue).

Parameters:

pscore_cut (typedef) – Lower bound (inclusive) for -log10(pvalue).

filter_qscore(qscore_cut)

Filter peaks by minimum -log10(qvalue).

Parameters:

qscore_cut (typedef) – Lower bound (inclusive) for -log10(qvalue).

filter_score(lower_score, upper_score=0)

Filter peaks by their primary score range.

Parameters:
  • lower_score (typedef) – Inclusive lower bound on score.

  • upper_score (typedef) – Exclusive upper bound; if <= 0 the bound is ignored.

get_chr_names()

Return the chromosome names represented in the collection.

Returns:

Unique chromosome names.

Return type:

set

get_data_from_chrom(chrom)

Return peaks for chrom, initialising storage if needed.

Parameters:

chrom (bytes) – Chromosome name to query.

Returns:

Peaks associated with chrom.

Return type:

list

name = None
peaks = None
randomly_pick(n, seed=12345)

Return a new PeakIO containing n randomly sampled peaks.

Parameters:
  • n (typedef) – Number of peaks to sample.

  • seed (typedef) – RNG seed to ensure reproducibility.

Returns:

Fresh instance populated with sampled peaks.

Return type:

PeakIO

read_from_xls(ofhd)

Load peak records from a MACS3 .xls tab-delimited report.

Parameters:

ofhd – Readable file-like object positioned at the beginning of the report.

sort()

Sort peaks on each chromosome by ascending start position.

to_summits_bed()

Write peak summits in BED5 format to stdout.

Each summit is emitted as a one-base interval with the selected score column.

tobed()

Write peaks in BED5 format to stdout.

The five columns correspond to chromosome, start, end, name, and the attribute selected by score_column.

total = None
write_to_bed(fhd, name_prefix=b'%s_peak_', name=b'MACS', description=b'%s', score_column='score', trackline=True)

Write peaks to a file handle in BED5 format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build peak names.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for optional header line.

  • score_column (str) – Peak attribute to emit as the score field.

  • trackline (_fake_callable) – Whether to emit a UCSC track header line.

write_to_narrowPeak(fhd, name_prefix=b'%s_peak_', name=b'MACS', score_column='score', trackline=False)

Write peaks in the ENCODE narrowPeak (BED6+4) format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to construct peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • score_column (str) – Peak attribute mapped to the narrowPeak score field.

  • trackline (_fake_callable) – Whether to emit a UCSC track header.

write_to_summit_bed(fhd, name_prefix=b'%s_peak_', name=b'MACS', description=b'%s', score_column='score', trackline=False)

Write peak summits to a file handle in BED5 format.

Parameters:
  • fhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build summit names.

  • name (bytes) – Dataset label interpolated into name_prefix.

  • description (bytes) – Track description for optional header line.

  • score_column (str) – Peak attribute to emit as the score field.

  • trackline (_fake_callable) – Whether to emit a UCSC track header line.

write_to_xls(ofhd, name_prefix=b'%s_peak_', name=b'MACS')

Export narrow peaks to a tab-delimited .xls text file.

Parameters:
  • ofhd – Writable file-like object.

  • name_prefix (bytes) – Template used to build peak identifiers.

  • name (bytes) – Dataset label interpolated into name_prefix.

class MACS3.IO.PeakIO.RegionIO

Bases: object

Helper for storing and manipulating simple genomic regions.

add_loc(chrom, start, end)

Append a new (start, end) interval for chrom.

get_chr_names()

Return chromosome names present in the region set.

Return type:

set

merge_overlap()

Merge overlapping intervals within each chromosome.

regions: dict
sort()

Sort regions for each chromosome by their start coordinate.

write_to_bed(fhd)

Emit regions in BED format to the provided file-like object.

MACS3.IO.PeakIO.bool(*args, **kwargs)
MACS3.IO.PeakIO.subpeak_letters(i)

Return the alphabetical label for a zero-based subpeak index.

Parameters:

i (typedef) – Zero-based subpeak index.

Returns:

Alphabetical label sequence (a, b, …, aa).

Return type:

str