MACS3.Signal.BedGraph module

Module for BedGraph data class.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).

class MACS3.Signal.BedGraph.bedGraphTrackI

Bases: object

Sparse representation of a bedGraph signal track.

The implementation assumes regions are continuous and non-overlapping within each chromosome. Coordinates are stored as 0-based, right-open transition points paired with the preceding value.

add_chrom_data(chromosome, p, v)

Replace chromosome data using position and value arrays.

add_chrom_data_PV(chromosome, pv)

Replace chromosome data from a structured numpy array pv.

add_loc(chromosome, startpos, endpos, value)

Append [startpos, endpos) with value on chromosome.

The caller is responsible for providing non-overlapping, sorted regions. Adjacent regions with identical values are merged.

add_loc_wo_merge(chromosome, startpos, endpos, value)

Append [startpos, endpos) without merging identical neighbours.

apply_func(func)

Apply function ‘func’ to every value in this bedGraphTrackI object.

*Two adjacent regions with same value after applying func will not be merged.

baseline_value
call_broadpeaks(lvl1_cutoff=500.0, lvl2_cutoff=100.0, min_length=200, lvl1_max_gap=50, lvl2_max_gap=400)

Return broad peaks built from high- and low-stringency thresholds.

Parameters:
  • lvl1_cutoff (cython.float) – Signal threshold for core enriched segments.

  • lvl2_cutoff (cython.float) – Lower threshold for linking segments.

  • min_length (cython.int) – Minimum length for reported peaks.

  • lvl1_max_gap (cython.int) – Maximum gap size when merging level-1 segments.

  • lvl2_max_gap (cython.int) – Maximum allowed length for linking segments.

Returns:

Broad peaks constructed from level-1/level-2 segments.

Return type:

BroadPeakIO

call_peaks(cutoff=1.0, min_length=200, max_gap=50, call_summits=False)

Return narrow peaks where signal stays above cutoff.

Segments above the cutoff are merged when separated by gaps no larger than max_gap. Peaks shorter than min_length are discarded.

Parameters:
  • cutoff (cython.float) – Minimum signal value for inclusion.

  • min_length (cython.int) – Minimum peak length (in bases) to report.

  • max_gap (cython.int) – Maximum distance between adjacent segments to merge.

  • call_summits (bool) – Reserved flag; summits are always computed.

Returns:

Discrete peak intervals with summit annotations.

Return type:

PeakIO

cutoff_analysis(max_gap, min_length, steps=100, min_score=0.0, max_score=1000.0)

Summarise peak metrics across a range of score thresholds.

Parameters:
  • max_gap (cython.int) – Maximum distance between merged regions.

  • min_length (cython.int) – Minimum peak length to keep.

  • steps (cython.int) – Number of cutoff increments between the observed minimum and maximum scores.

  • min_score (cython.float) – Lower bound for the cutoff sweep.

  • max_score (cython.float) – Upper bound for the cutoff sweep.

Returns:

Tab-delimited report of peak counts and lengths per cutoff.

Return type:

str

destroy()

Release stored chromosome data and reset caches.

extract_value(bdgTrack2)

Extract values from regions defined in bedGraphTrackI class object bdgTrack2.

extract_value_hmmr(bdgTrack2)

Extract values from regions defined in bedGraphTrackI class object bdgTrack2.

I will try to tweak this function to output only the values of bdgTrack1 (self) in the regions in bdgTrack2

This is specifically for HMMRATAC. bdgTrack2 should be a bedgraph object containing the bins with value set to ‘mark_bin’ – the bins in the same region will have the same value.

filter_score(cutoff=0.0)

Clamp regions below cutoff to self.baseline_value.

get_chr_names()

Return the set of chromosomes currently stored.

get_data_by_chr(chromosome)

Return (positions, values) arrays for chromosome.

make_ScoreTrackII_for_macs(bdgTrack2, depth1=1.0, depth2=1.0)

A modified overlie function for MACS v2.

effective_depth_in_million: sequencing depth in million after

duplicates being filtered. If treatment is scaled down to control sample size, then this should be control sample size in million. And vice versa.

Return value is a ScoreTrackII object.

maxvalue
minvalue
overlie(bdgTracks, func='max')

Calculate two or more bedGraphTrackI objects by letting self overlying bdgTrack2, with user-defined functions.

Transition positions from both bedGraphTrackI objects will be considered and combined. For example:

#1 bedGraph (self) | #2 bedGraph

chr1 200 300 4 | chr1 250 300 4

these two bedGraphs will be combined to have five transition points: 100, 150, 200, 250, and 300. So in order to calculate two bedGraphs, I pair values within the following regions like:

chr s e (#1,#2) applied_func_max

chr1 0 100 (0,1) 1 chr1 100 150 (3,1) 3 chr1 150 200 (3,2) 3 chr1 200 250 (4,2) 4 chr1 250 300 (4,4) 4

Then the given ‘func’ will be applied on each 2-tuple as func(#1,#2)

Supported ‘func’ are “sum”, “subtract” (only for two bdg objects), “product”, “divide” (only for two bdg objects), “max”, “mean” and “fisher”.

Return value is a new bedGraphTrackI object.

Option: bdgTracks can be a list of bedGraphTrackI objects

p2q()

Convert pvalue scores to qvalue scores.

*Assume scores in this bedGraph are pvalue scores! Not work

for other type of scores.

refine_peaks(peaks)

Recalculate peak bounds and summits from an initial PeakIO.

reset_baseline(baseline_value)

Reset baseline_value and clamp regions below the new baseline.

set_single_value(new_value)

Change all the values in bedGraph to the same new_value, return a new bedGraphTrackI.

summary()

Return global summary statistics for the track.

Returns:

(sum, length, max, min, mean, std_dev) evaluated across all chromosomes.

Return type:

tuple

total()

Return the number of regions in this object.

class MACS3.Signal.BedGraph.bedGraphTrackII

Bases: object

NumPy-backed variant of bedGraphTrackI.

add_chrom_data(chromosome, pv)

Replace chromosome data with the structured numpy array pv.

add_loc(chromosome, startpos, endpos, value)

Append [startpos, endpos) with value on chromosome.

add_loc_wo_merge(chromosome, startpos, endpos, value)

Append [startpos, endpos) without merging adjacent values.

baseline_value
call_broadpeaks(lvl1_cutoff=500.0, lvl2_cutoff=100.0, min_length=200, lvl1_max_gap=50, lvl2_max_gap=400)

Return broad peaks built from high- and low-stringency thresholds.

call_peaks(cutoff=1.0, min_length=200, max_gap=50, call_summits=False)

Return narrow peaks where signal stays above cutoff.

destroy()

Release allocated arrays and reset chromosome metadata.

filter_score(cutoff=0.0)

Retain only segments with values greater than cutoff.

finalize()

Trim underlying arrays to size and update cached min/max values.

get_chr_names()

Return the set of chromosomes currently stored.

get_data_by_chr(chromosome)

Return the structured array for chromosome or None.

maxvalue
minvalue
refine_peaks(peaks)

Recalculate peak bounds and summits from an initial PeakIO.

summary()

Return (sum, length, max, min, mean, std_dev) across the track.

total()

Return the number of regions in this object.