MACS3.Signal.FixWidthTrack module

Module for FWTrack classes.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).

class MACS3.Signal.FixWidthTrack.FWTrack(fw=0, anno='', buffer_size=100000)

Bases: object

Fixed-width fragment track grouped by chromosome.

Stores plus- and minus-strand 5’ cut positions in numpy arrays and exposes utilities for sorting, filtering, sampling, and pileup generation.

add_loc(chromosome, fiveendpos, strand)

Append a 5’ cut position to the track.

Parameters:

chromosome (bytes) – Chromosome name (as bytes) that owns the cut.
fiveendpos (int) – Zero-based 5’ coordinate of the cut site.
strand (int) – Strand flag where 0 denotes plus and 1 denotes minus.

Notes

Positions are stored in strand-specific numpy arrays keyed by chromosome, and the strand pointer is advanced as new positions are appended.

annotation = None

buf_size: dict

buffer_size = None

compute_region_tags_from_peaks(peaks, func, window_size=100, cutoff=5.0)

Apply a summary function to tags collected around peak regions.

Parameters:

peaks (MACS3.IO.PeakIO.PeakIO) – Peak container providing genomic intervals and metadata.
func (callable) – Callback invoked as func(chrom, plus, minus, startpos, endpos, ...) for each peak. The callable must accept window_size and cutoff keyword arguments.
window_size (int, optional) – Half-window size added on each side of every peak when collecting tags.
cutoff (float, optional) – Additional threshold passed to func.

Returns:

Results returned by func for each processed peak.

Return type:

list

Notes

Both the track and the peaks object are sorted before iteration, and per-chromosome state is reused to avoid rescanning arrays.

destroy()

Release numpy buffers held by the track.

All per-chromosome arrays are resized to zero so the memory footprint returns to the allocator, and the track is marked as destroyed.

extract_region_tags(chromosome, startpos, endpos)

Collect positions within a genomic window for both strands.

Parameters:

chromosome (bytes) – Chromosome identifier to query.
startpos (int) – Inclusive start coordinate of the window.
endpos (int) – Inclusive end coordinate of the window.

Returns:

Pair of numpy arrays (plus, minus) containing positions inside the requested window.

Return type:

tuple[numpy.ndarray, numpy.ndarray]

Notes

The track is sorted on demand before performing the windowed lookup.

filter_dup(maxnum=-1)

Limit duplicate 5’ positions to a maximum count per strand.

Parameters:: maxnum (int, optional) – Maximum number of occurrences allowed per coordinate. A negative value disables duplicate filtering.
Returns:: Total number of retained positions across both strands after filtering.
Return type:: int

Notes

The track must be sorted before filtering. Coordinates exceeding maxnum are discarded, pointers are updated, and total/length are recomputed.

finalize()

Shrink arrays and sort per-strand coordinates in place.

Each chromosome’s plus- and minus-strand arrays are resized to the observed counts, sorted ascending, and aggregate counters such as total and length are refreshed. Call this after loading data.

fw = None

get_chr_names()

Return a sorted set of chromosome names stored in the track.

Returns:: Sorted chromosome names (bytes) that currently have positions.
Return type:: set

get_locations_by_chr(chromosome)

Return the strand-specific arrays for a chromosome.

Parameters:: chromosome (bytes) – Chromosome name, provided as bytes.
Returns:: Pair of numpy arrays (plus, minus) containing 5’ positions.
Return type:: tuple[numpy.ndarray, numpy.ndarray]
Raises:: Exception – If the chromosome is not present in the track.

get_rlengths()

Return the reference chromosome lengths associated with the track.

Returns:: Mapping from chromosome name (bytes) to reference length. Chromosomes without a recorded length default to INT_MAX.
Return type:: dict

is_destroyed: _fake_callable

is_sorted: _fake_callable

length = None

locations: dict

pileup_a_chromosome(chrom, d, scale_factor=1.0, baseline_value=0.0, directional=True, end_shift=0)

Compute a coverage pileup for a single chromosome.

Parameters:

chrom (bytes) – Chromosome name to pile up.
d (int) – Extension length applied in the 3’ direction unless directional is False.
scale_factor (float, optional) – Value used to scale the resulting coverage.
baseline_value (float, optional) – Minimum value enforced on the coverage array.
directional (bool, optional) – If False, extend cuts symmetrically to both sides by d / 2.
end_shift (int, optional) – Shift applied to the 5’ cuts before extension; positive values move toward the 3’ direction.

Returns:

Two-element list [positions, values] with numpy arrays describing the pileup breakpoints and scaled coverage.

Return type:

list

pileup_a_chromosome_c(chrom, ds, scale_factor_s, baseline_value=0.0, directional=True, end_shift=0)

Compute a control pileup using multiple extension lengths.

Parameters:

chrom (bytes) – Chromosome name to pile up.
ds (list[int]) – Extension lengths used to build individual pileups.
scale_factor_s (list[float]) – Scale factors paired with each entry in ds.
baseline_value (float, optional) – Minimum value enforced on the coverage array.
directional (bool, optional) – If False, extend cuts symmetrically to both sides by d / 2.
end_shift (int, optional) – Shift applied to the 5’ cuts before extension; positive values move toward the 3’ direction.

Returns:

Two-element list [positions, values] representing the merged pileup where the maximum value is taken across the supplied extensions.

Return type:

list

pointer: dict

print_to_bed(fhd=None)

Stream the track as BED records.

Parameters:: fhd (io.IOBase, optional) – Writable file-like object. Defaults to sys.stdout.

Notes

Emits one record per stored position with fixed-width intervals derived from fw and strand-specific orientation.

rlengths: dict

sample_num(samplesize, seed=-1)

Down-sample positions in place so the total approximates samplesize.

Parameters:

samplesize (int) – Target number of positions across both strands.
seed (int, optional) – Seed forwarded to sample_percent().

Notes

The method converts samplesize into a sampling fraction using the current total and reuses sample_percent().

sample_percent(percent, seed=-1)

Down-sample positions in place by a fixed percentage.

Parameters:

percent (float) – Fraction of positions to keep per strand between 0 and 1 (inclusive).
seed (int, optional) – Seed forwarded to NumPy’s RNG; a negative value uses global state.

Notes

Sampling is performed independently for plus and minus strands by shuffling each array, resizing to the requested fraction, and restoring sort order. Aggregate counters total and length are refreshed.

set_rlengths(rlengths)

Attach reference chromosome lengths to the track.

Parameters:: rlengths (dict) – Mapping from chromosome name (bytes) to reference length.
Returns:: True when the length mapping has been updated.
Return type:: bool

Notes

Any chromosome stored in the track but missing from rlengths is assigned INT_MAX so downstream bounds checks can succeed.

sort()

Sort per-strand coordinate arrays for every chromosome.

Positions are ordered ascending on each strand and the is_sorted flag is set to True once sorting completes.

total = None

MACS3.Signal.FixWidthTrack.bool(*args, **kwargs)

MACS3.Signal.FixWidthTrack.left_forward(data, pos, window_size)

Return type:: typedef

MACS3.Signal.FixWidthTrack.left_sum(data, pos, width)

Return type:: typedef

MACS3.Signal.FixWidthTrack.right_forward(data, pos, window_size)

Return type:: typedef

MACS3.Signal.FixWidthTrack.right_sum(data, pos, width)

Return type:: typedef