MACS3.Signal.ScoreTrack module

Scoring utilities for MACS3 signal tracks and peak callers.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).

class MACS3.Signal.ScoreTrack.ScoreTrackII(treat_depth, ctrl_depth, pseudocount=1.0)

Bases: object

Container for treatment/control pileups and derived score tracks.

add(chromosome, endpos, chip, control): Append treatment/control pileup ending at endpos for chromosome.

add_chromosome(chrom, chrom_max_len): Allocate arrays for chrom with capacity chrom_max_len.

call_broadpeaks(lvl1_cutoff=5.0, lvl2_cutoff=1.0, min_length=200, lvl1_max_gap=50, lvl2_max_gap=400)

Return broad peaks constructed from high- and low-cutoff segments.

Parameters:

lvl1_cutoff (typedef) – Threshold for core enriched segments.
lvl2_cutoff (typedef) – Threshold for linking segments.
min_length (typedef) – Minimum peak length to report.
lvl1_max_gap (typedef) – Maximum gap when merging level-1 segments.
lvl2_max_gap (typedef) – Maximum allowed length for linking segments.

call_peaks(cutoff=5.0, min_length=200, max_gap=50, call_summits=False)

Return peaks where scores remain above cutoff.

Parameters:

cutoff (typedef) – Minimum score threshold (e.g., -log10 p).
min_length (typedef) – Minimum peak length in bases.
max_gap (typedef) – Maximum distance between merged segments.
call_summits (_fake_callable) – Whether to report all local maxima within peaks.

change_normalization_method(normalization_method)

Change/set normalization method. However, I do not recommend change this back and forward, since some precision issue will happen – I only keep two digits.

normalization_method: T: scale to depth of treatment;: C: scale to depth of control; M: scale to depth of 1 million; N: not set/ raw pileup

change_score_method(scoring_method)

scoring_method: p: -log10 pvalue;

q: -log10 qvalue; l: log10 likelihood ratio (minus for depletion) s: symmetric log10 likelihood ratio (for comparing two

ChIPs)

f: log10 fold enrichment F: linear fold enrichment d: subtraction M: maximum m: fragment pileup per million reads

compute_SPMR(): Populate scores with treatment pileup per million reads.

compute_foldenrichment(): Calculate linear scale fold enrichment (with 1 pseudocount).

compute_likelihood(): Calculate log10 likelihood.

compute_logFE(): Calculate log10 fold enrichment (with 1 pseudocount).

compute_max(): Populate scores with the element-wise maximum of treatment and control.

compute_pvalue(): Compute -log_{10}(pvalue)

compute_qvalue(): Compute -log_{10}(qvalue)

compute_subtraction(): Populate scores with treatment minus control pileup.

compute_sym_likelihood(): Calculate symmetric log10 likelihood.

ctrl_edm: typedef

cutoff: typedef

cutoff_analysis(max_gap=50, min_length=200, steps=100, min_score=0, max_score=1000)

Summarise peak metrics across a range of score thresholds.

Parameters:

max_gap (typedef) – Maximum distance between merged regions.
min_length (typedef) – Minimum peak length to keep.
steps (typedef) – Number of cutoff increments between the observed minimum and maximum scores.
min_score (typedef) – Lower bound for the cutoff sweep.
max_score (typedef) – Upper bound for the cutoff sweep.

Returns:

Tab-delimited report of peak counts and lengths per cutoff.

Return type:

str

data: dict

datalength: dict

enable_trackline(): Enable UCSC track line output when exporting bedGraphs.

finalize(): Trim per-chromosome arrays to their populated length.

get_chr_names(): Return all the chromosome names stored.

get_data_by_chr(chromosome): Return (positions, treatment, control, score) arrays for chromosome.

make_pq_table()

Make pvalue-qvalue table.

Step1: get all pvalue and length of block with this pvalue Step2: Sort them Step3: Apply AFDR method to adjust pvalue and get qvalue for

each pvalue

Return a dictionary of {-log10pvalue:(-log10qvalue,rank,basepairs)} relationships.

normalization_method: typedef

normalize(treat_scale, control_scale): Scale treatment and control pileups in-place by the given factors.

pseudocount: typedef

pvalue_stat: dict

scoring_method: typedef

set_pseudocount(pseudocount): Update the pseudocount used when computing score metrics.

total()

Return the number of regions in this object.

Return type:: typedef

trackline: _fake_callable

treat_edm: typedef

write_bedGraph(fhd, name, description, column=3)

Write all data to fhd in bedGraph Format.

fhd: a filehandler to save bedGraph.

name/description: the name and description in track line.

colname: can be 1: chip, 2: control, 3: score

class MACS3.Signal.ScoreTrack.TwoConditionScores(t1bdg, c1bdg, t2bdg, c2bdg, cond1_factor=1.0, cond2_factor=1.0, pseudocount=0.01, proportion_background_empirical_distribution=0.99999)

Bases: object

Class for saving two condition comparison scores.

add(chromosome, endpos, t1, c1, t2, c2)

Take chr-endpos-sample1-control1-sample2-control2 and compute logLR for t1 vs c1, t2 vs c2, and t1 vs t2, then save values.

chromosome: chromosome name in string endpos : end position of each interval in integer t1 : Sample 1 ChIP pileup value of each interval in float c1 : Sample 1 Control pileup value of each interval in float t2 : Sample 2 ChIP pileup value of each interval in float c2 : Sample 2 Control pileup value of each interval in float

Warning Need to add regions continuously.

add_chromosome(chrom, chrom_max_len): Allocate storage for chrom with capacity chrom_max_len.

build(): Compute scores from 3 types of comparisons and store them in self.data.

build_chromosome(chrname, cond1_treat_ps, cond1_control_ps, cond2_treat_ps, cond2_control_ps, cond1_treat_vs, cond1_control_vs, cond2_treat_vs, cond2_control_vs)

Internal function to calculate scores for three types of comparisons.

cond1_treat_ps, cond1_control_ps: position of treat and control of condition 1 cond2_treat_ps, cond2_control_ps: position of treat and control of condition 2 cond1_treat_vs, cond1_control_vs: value of treat and control of condition 1 cond2_treat_vs, cond2_control_vs: value of treat and control of condition 2

c1bdg: object

c2bdg: object

call_peaks(cutoff=3, min_length=200, max_gap=100, call_summits=False)

This function try to find regions within which, scores are continuously higher than a given cutoff.

For bdgdiff.

This function is NOT using sliding-windows. Instead, any regions in bedGraph above certain cutoff will be detected, then merged if the gap between nearby two regions are below max_gap. After this, peak is reported if its length is above min_length.

cutoff: cutoff of value, default 3. For log10 LR, it means 1000 or -1000. min_length : minimum peak length, default 200. max_gap : maximum gap to merge nearby peaks, default 100. ptrack: an optional track for pileup heights. If it’s not None, use it to find summits. Otherwise, use self/scoreTrack.

Return type:: tuple

cond1_factor: typedef

cond2_factor: typedef

cutoff: typedef

data: dict

datalength: dict

finalize(): Adjust array size of each chromosome.

get_chr_names(): Return all the chromosome names stored.

get_common_chrs()

Return chromosome names shared across all input bedGraphs.

Return type:: set

get_data_by_chr(chromosome)

Return array of counts by chromosome.

The return value is a tuple: ([end pos],[value])

mean_from_peakcontent(peakcontent)

Return type:: typedef

pseudocount: typedef

pvalue_stat1: dict

pvalue_stat2: dict

pvalue_stat3: dict

set_pseudocount(pseudocount): Update the pseudocount used for differential scoring.

t1bdg: object

t2bdg: object

total()

Return the number of regions in this object.

Return type:: typedef

write_bedGraph(fhd, name, description, column=3)

Write all data to fhd in bedGraph Format.

fhd: a filehandler to save bedGraph.

name/description: the name and description in track line.

colname: can be 1: cond1 chip vs cond1 ctrl, 2: cond2 chip vs cond2 ctrl, 3: cond1 chip vs cond2 chip

write_matrix(fhd, name, description)

Write all data to fhd into five columns Format:

col1: chr_start_end col2: t1 vs c1 col3: t2 vs c2 col4: t1 vs t2

fhd: a filehandler to save the matrix.

MACS3.Signal.ScoreTrack.bool(*args, **kwargs)

MACS3.Signal.ScoreTrack.get_logFE(x, y)

Return log10 fold enrichment (base-10) for x over y.

Return type:: typedef

MACS3.Signal.ScoreTrack.get_pscore(observed, expectation)

Return cached -log10 Poisson tail probability for observed.

Return type:: typedef

MACS3.Signal.ScoreTrack.get_subtraction(x, y)

Return the difference x - y.

Return type:: typedef

MACS3.Signal.ScoreTrack.int_max(a, b)

Return the larger of a and b.

Return type:: typedef

MACS3.Signal.ScoreTrack.int_min(a, b)

Return the smaller of a and b.

Return type:: typedef

MACS3.Signal.ScoreTrack.log(*args, **kwargs)

MACS3.Signal.ScoreTrack.log10(*args, **kwargs)

MACS3.Signal.ScoreTrack.logLR_asym(x, y)

Return asymmetric log10 likelihood ratio between x and y.

Return type:: typedef

MACS3.Signal.ScoreTrack.logLR_sym(x, y)

Return symmetric log10 likelihood ratio between x and y.

Return type:: typedef