MACS3.Signal.BedGraph module

Module for BedGraph data class.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).

class MACS3.Signal.BedGraph.bedGraphTrackI(baseline_value=0)

Bases: object

Sparse representation of a bedGraph signal track.

The implementation assumes regions are continuous and non-overlapping within each chromosome. Coordinates are stored as 0-based, right-open transition points paired with the preceding value.

add_chrom_data(chromosome, p, v)

Replace chromosome data using position and value arrays.

add_chrom_data_PV(chromosome, pv)

Replace chromosome data from a structured numpy array pv.

add_loc(chromosome, startpos, endpos, value)

Append [startpos, endpos) with value on chromosome.

The caller is responsible for providing non-overlapping, sorted regions. Adjacent regions with identical values are merged.

add_loc_wo_merge(chromosome, startpos, endpos, value)

Append [startpos, endpos) without merging identical neighbours.

apply_func(func)

Apply function ‘func’ to every value in this bedGraphTrackI object.

Return type:

_fake_callable

*Two adjacent regions with same value after applying func will not be merged.

baseline_value = None
call_broadpeaks(lvl1_cutoff=500, lvl2_cutoff=100, min_length=200, lvl1_max_gap=50, lvl2_max_gap=400)

Return broad peaks built from high- and low-stringency thresholds.

Parameters:
  • lvl1_cutoff (typedef) – Signal threshold for core enriched segments.

  • lvl2_cutoff (typedef) – Lower threshold for linking segments.

  • min_length (typedef) – Minimum length for reported peaks.

  • lvl1_max_gap (typedef) – Maximum gap size when merging level-1 segments.

  • lvl2_max_gap (typedef) – Maximum allowed length for linking segments.

Returns:

Broad peaks constructed from level-1/level-2 segments.

Return type:

BroadPeakIO

call_peaks(cutoff=1, min_length=200, max_gap=50, call_summits=False)

Return narrow peaks where signal stays above cutoff.

Segments above the cutoff are merged when separated by gaps no larger than max_gap. Peaks shorter than min_length are discarded.

Parameters:
  • cutoff (typedef) – Minimum signal value for inclusion.

  • min_length (typedef) – Minimum peak length (in bases) to report.

  • max_gap (typedef) – Maximum distance between adjacent segments to merge.

  • call_summits (_fake_callable) – Reserved flag; summits are always computed.

Returns:

Discrete peak intervals with summit annotations.

Return type:

PeakIO

cutoff_analysis(max_gap, min_length, steps=100, min_score=0, max_score=1000)

Summarise peak metrics across a range of score thresholds.

Parameters:
  • max_gap (typedef) – Maximum distance between merged regions.

  • min_length (typedef) – Minimum peak length to keep.

  • steps (typedef) – Number of cutoff increments between the observed minimum and maximum scores.

  • min_score (typedef) – Lower bound for the cutoff sweep.

  • max_score (typedef) – Upper bound for the cutoff sweep.

Returns:

Tab-delimited report of peak counts and lengths per cutoff.

Return type:

str

destroy()

Release stored chromosome data and reset caches.

Return type:

_fake_callable

extract_value(bdgTrack2)

Extract values from regions defined in bedGraphTrackI class object bdgTrack2.

extract_value_hmmr(bdgTrack2)

Extract values from regions defined in bedGraphTrackI class object bdgTrack2.

I will try to tweak this function to output only the values of bdgTrack1 (self) in the regions in bdgTrack2

This is specifically for HMMRATAC. bdgTrack2 should be a bedgraph object containing the bins with value set to ‘mark_bin’ – the bins in the same region will have the same value.

filter_score(cutoff=0)

Clamp regions below cutoff to self.baseline_value.

Return type:

_fake_callable

get_chr_names()

Return the set of chromosomes currently stored.

Return type:

set

get_data_by_chr(chromosome)

Return (positions, values) arrays for chromosome.

Return type:

list

make_ScoreTrackII_for_macs(bdgTrack2, depth1=1.0, depth2=1.0)

A modified overlie function for MACS v2.

effective_depth_in_million: sequencing depth in million after

duplicates being filtered. If treatment is scaled down to control sample size, then this should be control sample size in million. And vice versa.

Return value is a ScoreTrackII object.

maxvalue = None
merge_regions()

Coalesce adjacent segments that share the same value.

minvalue = None
overlie(bdgTracks, func='max')

Calculate two or more bedGraphTrackI objects by letting self overlying bdgTrack2, with user-defined functions.

Transition positions from both bedGraphTrackI objects will be considered and combined. For example:

#1 bedGraph (self) | #2 bedGraph

chr1 200 300 4 | chr1 250 300 4

these two bedGraphs will be combined to have five transition points: 100, 150, 200, 250, and 300. So in order to calculate two bedGraphs, I pair values within the following regions like:

chr s e (#1,#2) applied_func_max

chr1 0 100 (0,1) 1 chr1 100 150 (3,1) 3 chr1 150 200 (3,2) 3 chr1 200 250 (4,2) 4 chr1 250 300 (4,4) 4

Then the given ‘func’ will be applied on each 2-tuple as func(#1,#2)

Supported ‘func’ are “sum”, “subtract” (only for two bdg objects), “product”, “divide” (only for two bdg objects), “max”, “mean” and “fisher”.

Return value is a new bedGraphTrackI object.

Option: bdgTracks can be a list of bedGraphTrackI objects

p2q()

Convert pvalue scores to qvalue scores.

*Assume scores in this bedGraph are pvalue scores! Not work

for other type of scores.

refine_peaks(peaks)

Recalculate peak bounds and summits from an initial PeakIO.

reset_baseline(baseline_value)

Reset baseline_value and clamp regions below the new baseline.

set_single_value(new_value)

Change all the values in bedGraph to the same new_value, return a new bedGraphTrackI.

summary()

Return global summary statistics for the track.

Returns:

(sum, length, max, min, mean, std_dev) evaluated across all chromosomes.

Return type:

tuple

total()

Return the number of regions in this object.

Return type:

typedef

class MACS3.Signal.BedGraph.bedGraphTrackII(baseline_value=0, buffer_size=100000)

Bases: object

NumPy-backed variant of bedGraphTrackI.

add_chrom_data(chromosome, pv)

Replace chromosome data with the structured numpy array pv.

add_loc(chromosome, startpos, endpos, value)

Append [startpos, endpos) with value on chromosome.

add_loc_wo_merge(chromosome, startpos, endpos, value)

Append [startpos, endpos) without merging adjacent values.

baseline_value = None
buffer_size: int
call_broadpeaks(lvl1_cutoff=500, lvl2_cutoff=100, min_length=200, lvl1_max_gap=50, lvl2_max_gap=400)

Return broad peaks built from high- and low-stringency thresholds.

call_peaks(cutoff=1.0, min_length=200, max_gap=50, call_summits=False)

Return narrow peaks where signal stays above cutoff.

destroy()

Release allocated arrays and reset chromosome metadata.

Return type:

_fake_callable

filter_score(cutoff=0)

Retain only segments with values greater than cutoff.

Return type:

_fake_callable

finalize()

Trim underlying arrays to size and update cached min/max values.

get_chr_names()

Return the set of chromosomes currently stored.

Return type:

set

get_data_by_chr(chromosome)

Return the structured array for chromosome or None.

Return type:

_fake_callable

maxvalue = None
minvalue = None
refine_peaks(peaks)

Recalculate peak bounds and summits from an initial PeakIO.

summary()

Return (sum, length, max, min, mean, std_dev) across the track.

Return type:

tuple

total()

Return the number of regions in this object.

Return type:

typedef

MACS3.Signal.BedGraph.bool(*args, **kwargs)
MACS3.Signal.BedGraph.calculate_elbows(values, threshold=0.01)
Return type:

_fake_callable

MACS3.Signal.BedGraph.divide_func(x)

Return the ratio x[1] / x[2].

MACS3.Signal.BedGraph.fisher_func(x)

Combine -log10 p-values in x using Fisher’s method.

MACS3.Signal.BedGraph.log10(*args, **kwargs)
MACS3.Signal.BedGraph.mean_func(x)

Return the arithmetic mean of x.

MACS3.Signal.BedGraph.product_func(x)

Return the product of all values in x.

MACS3.Signal.BedGraph.sqrt(*args, **kwargs)
MACS3.Signal.BedGraph.subtract_func(x)

Return the difference x[1] - x[0].