MACS3.Signal.BedGraph module
Module for BedGraph data class.
This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).
- class MACS3.Signal.BedGraph.bedGraphTrackI(baseline_value=0)
Bases:
objectSparse representation of a bedGraph signal track.
The implementation assumes regions are continuous and non-overlapping within each chromosome. Coordinates are stored as 0-based, right-open transition points paired with the preceding value.
- add_chrom_data(chromosome, p, v)
Replace
chromosomedata using position and value arrays.
- add_chrom_data_PV(chromosome, pv)
Replace
chromosomedata from a structured numpy arraypv.
- add_loc(chromosome, startpos, endpos, value)
Append
[startpos, endpos)withvalueonchromosome.The caller is responsible for providing non-overlapping, sorted regions. Adjacent regions with identical values are merged.
- add_loc_wo_merge(chromosome, startpos, endpos, value)
Append
[startpos, endpos)without merging identical neighbours.
- apply_func(func)
Apply function ‘func’ to every value in this bedGraphTrackI object.
- Return type:
_fake_callable
*Two adjacent regions with same value after applying func will not be merged.
- baseline_value = None
- call_broadpeaks(lvl1_cutoff=500, lvl2_cutoff=100, min_length=200, lvl1_max_gap=50, lvl2_max_gap=400)
Return broad peaks built from high- and low-stringency thresholds.
- Parameters:
lvl1_cutoff (
typedef) – Signal threshold for core enriched segments.lvl2_cutoff (
typedef) – Lower threshold for linking segments.min_length (
typedef) – Minimum length for reported peaks.lvl1_max_gap (
typedef) – Maximum gap size when merging level-1 segments.lvl2_max_gap (
typedef) – Maximum allowed length for linking segments.
- Returns:
Broad peaks constructed from level-1/level-2 segments.
- Return type:
- call_peaks(cutoff=1, min_length=200, max_gap=50, call_summits=False)
Return narrow peaks where signal stays above
cutoff.Segments above the cutoff are merged when separated by gaps no larger than
max_gap. Peaks shorter thanmin_lengthare discarded.- Parameters:
cutoff (
typedef) – Minimum signal value for inclusion.min_length (
typedef) – Minimum peak length (in bases) to report.max_gap (
typedef) – Maximum distance between adjacent segments to merge.call_summits (
_fake_callable) – Reserved flag; summits are always computed.
- Returns:
Discrete peak intervals with summit annotations.
- Return type:
- cutoff_analysis(max_gap, min_length, steps=100, min_score=0, max_score=1000)
Summarise peak metrics across a range of score thresholds.
- Parameters:
max_gap (
typedef) – Maximum distance between merged regions.min_length (
typedef) – Minimum peak length to keep.steps (
typedef) – Number of cutoff increments between the observed minimum and maximum scores.min_score (
typedef) – Lower bound for the cutoff sweep.max_score (
typedef) – Upper bound for the cutoff sweep.
- Returns:
Tab-delimited report of peak counts and lengths per cutoff.
- Return type:
str
- destroy()
Release stored chromosome data and reset caches.
- Return type:
_fake_callable
- extract_value(bdgTrack2)
Extract values from regions defined in bedGraphTrackI class object bdgTrack2.
- extract_value_hmmr(bdgTrack2)
Extract values from regions defined in bedGraphTrackI class object bdgTrack2.
I will try to tweak this function to output only the values of bdgTrack1 (self) in the regions in bdgTrack2
This is specifically for HMMRATAC. bdgTrack2 should be a bedgraph object containing the bins with value set to ‘mark_bin’ – the bins in the same region will have the same value.
- filter_score(cutoff=0)
Clamp regions below
cutofftoself.baseline_value.- Return type:
_fake_callable
- get_chr_names()
Return the set of chromosomes currently stored.
- Return type:
set
- get_data_by_chr(chromosome)
Return
(positions, values)arrays forchromosome.- Return type:
list
- make_ScoreTrackII_for_macs(bdgTrack2, depth1=1.0, depth2=1.0)
A modified overlie function for MACS v2.
- effective_depth_in_million: sequencing depth in million after
duplicates being filtered. If treatment is scaled down to control sample size, then this should be control sample size in million. And vice versa.
Return value is a ScoreTrackII object.
- maxvalue = None
- merge_regions()
Coalesce adjacent segments that share the same value.
- minvalue = None
- overlie(bdgTracks, func='max')
Calculate two or more bedGraphTrackI objects by letting self overlying bdgTrack2, with user-defined functions.
Transition positions from both bedGraphTrackI objects will be considered and combined. For example:
#1 bedGraph (self) | #2 bedGraph
chr1 200 300 4 | chr1 250 300 4
these two bedGraphs will be combined to have five transition points: 100, 150, 200, 250, and 300. So in order to calculate two bedGraphs, I pair values within the following regions like:
chr s e (#1,#2) applied_func_max
chr1 0 100 (0,1) 1 chr1 100 150 (3,1) 3 chr1 150 200 (3,2) 3 chr1 200 250 (4,2) 4 chr1 250 300 (4,4) 4
Then the given ‘func’ will be applied on each 2-tuple as func(#1,#2)
Supported ‘func’ are “sum”, “subtract” (only for two bdg objects), “product”, “divide” (only for two bdg objects), “max”, “mean” and “fisher”.
Return value is a new bedGraphTrackI object.
Option: bdgTracks can be a list of bedGraphTrackI objects
- p2q()
Convert pvalue scores to qvalue scores.
- *Assume scores in this bedGraph are pvalue scores! Not work
for other type of scores.
- refine_peaks(peaks)
Recalculate peak bounds and summits from an initial
PeakIO.
- reset_baseline(baseline_value)
Reset
baseline_valueand clamp regions below the new baseline.
- set_single_value(new_value)
Change all the values in bedGraph to the same new_value, return a new bedGraphTrackI.
- summary()
Return global summary statistics for the track.
- Returns:
(sum, length, max, min, mean, std_dev)evaluated across all chromosomes.- Return type:
tuple
- total()
Return the number of regions in this object.
- Return type:
typedef
- class MACS3.Signal.BedGraph.bedGraphTrackII(baseline_value=0, buffer_size=100000)
Bases:
objectNumPy-backed variant of
bedGraphTrackI.- add_chrom_data(chromosome, pv)
Replace
chromosomedata with the structured numpy arraypv.
- add_loc(chromosome, startpos, endpos, value)
Append
[startpos, endpos)withvalueonchromosome.
- add_loc_wo_merge(chromosome, startpos, endpos, value)
Append
[startpos, endpos)without merging adjacent values.
- baseline_value = None
-
buffer_size:
int
- call_broadpeaks(lvl1_cutoff=500, lvl2_cutoff=100, min_length=200, lvl1_max_gap=50, lvl2_max_gap=400)
Return broad peaks built from high- and low-stringency thresholds.
- call_peaks(cutoff=1.0, min_length=200, max_gap=50, call_summits=False)
Return narrow peaks where signal stays above
cutoff.
- destroy()
Release allocated arrays and reset chromosome metadata.
- Return type:
_fake_callable
- filter_score(cutoff=0)
Retain only segments with values greater than
cutoff.- Return type:
_fake_callable
- finalize()
Trim underlying arrays to size and update cached min/max values.
- get_chr_names()
Return the set of chromosomes currently stored.
- Return type:
set
- get_data_by_chr(chromosome)
Return the structured array for
chromosomeorNone.- Return type:
_fake_callable
- maxvalue = None
- minvalue = None
- refine_peaks(peaks)
Recalculate peak bounds and summits from an initial
PeakIO.
- summary()
Return
(sum, length, max, min, mean, std_dev)across the track.- Return type:
tuple
- total()
Return the number of regions in this object.
- Return type:
typedef
- MACS3.Signal.BedGraph.bool(*args, **kwargs)
- MACS3.Signal.BedGraph.calculate_elbows(values, threshold=0.01)
- Return type:
_fake_callable
- MACS3.Signal.BedGraph.divide_func(x)
Return the ratio
x[1] / x[2].
- MACS3.Signal.BedGraph.fisher_func(x)
Combine
-log10p-values inxusing Fisher’s method.
- MACS3.Signal.BedGraph.log10(*args, **kwargs)
- MACS3.Signal.BedGraph.mean_func(x)
Return the arithmetic mean of
x.
- MACS3.Signal.BedGraph.product_func(x)
Return the product of all values in
x.
- MACS3.Signal.BedGraph.sqrt(*args, **kwargs)
- MACS3.Signal.BedGraph.subtract_func(x)
Return the difference
x[1] - x[0].