MACS3.Signal.PairedEndTrack module
Module for filter duplicate tags from paired-end data
This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).
- class MACS3.Signal.PairedEndTrack.PETrackI(anno='', buffer_size=100000)
Bases:
objectIn-memory paired-end fragment container grouped by chromosome.
The track exposes utilities for sorting, filtering, downsampling, and pileup generation on numpy structured arrays of left/right coordinates.
- add_loc(chromosome, start, end)
Append a paired-end fragment to the track.
- Parameters:
chromosome (bytes) – Chromosome name (as bytes) that owns the fragment.
start (int) – Zero-based start coordinate of the fragment (5’ end).
end (int) – Zero-based end coordinate of the fragment (3’ end).
Notes
Fragments are stored in structured numpy arrays keyed by chromosome, and the running fragment count and total template length are updated in place.
- annotation = None
- average_template_length = None
- buf_size = None
- buffer_size = None
- count_fraglengths()
Count observed fragment lengths across the track.
- Returns:
Mapping from fragment length to observed count, useful for downstream models such as HMMRATAC.
- Return type:
dict
- destroy()
Release numpy buffers held by the track.
All per-chromosome arrays are resized to zero so the memory footprint is returned to the allocator, and the track is marked as destroyed.
- exclude(regions)
Remove fragments that overlap the provided exclusion regions.
- Parameters:
regions (MACS3.Signal.Region.Regions) – Sorted region collection whose intervals should be excluded.
Notes
The operation mutates the track in place and finishes by calling
finalize()to refresh cached statistics.
- filter_dup(maxnum=-1)
Limit the number of duplicate fragments at identical coordinates.
- Parameters:
maxnum (int, optional) – Maximum number of fragments allowed per unique
(start, end)pair. A negative value disables duplicate filtering.
Notes
Fragments exceeding
maxnumfor the same coordinates are dropped and the aggregate template length is adjusted accordingly.
- finalize()
Shrink backing arrays and sort fragments in place.
Each per-chromosome array is resized to the observed fragment count, sorted by the left and right coordinates, and the aggregate counters
totalandaverage_template_lengthare refreshed. Call this after loading data.
- fraglengths()
Return all fragment lengths as a single array.
- Returns:
Concatenated array of
end - startfor every stored fragment across chromosomes.- Return type:
numpy.ndarray
- get_chr_names()
Return the set of chromosome names stored in the track.
- Returns:
Chromosome names (bytes) that currently have fragments.
- Return type:
set
- get_locations_by_chr(chromosome)
Return the fragment array for a chromosome.
- Parameters:
chromosome (bytes) – Chromosome name, provided as bytes.
- Returns:
Structured array with
('l', 'i4')and('r', 'i4')fields.- Return type:
numpy.ndarray
- Raises:
Exception – If the chromosome is not present in the track.
- get_rlengths()
Return the reference chromosome lengths associated with the track.
- Returns:
Mapping from chromosome name (bytes) to reference length. Chromosomes without a recorded length default to
INT_MAX.- Return type:
dict
-
is_destroyed:
_fake_callable
- is_sorted = None
- length = None
- locations = None
- pileup_a_chromosome(chrom, scale_factor=1.0, baseline_value=0.0)
Compute a coverage pileup for a single chromosome.
- Parameters:
chrom (bytes) – Chromosome name to pile up.
scale_factor (float, optional) – Value used to scale the resulting coverage.
baseline_value (float, optional) – Minimum value enforced on the coverage array.
- Returns:
Two-element list
[positions, values]with numpy arrays describing the pileup breakpoints and scaled coverage.- Return type:
list
- pileup_a_chromosome_c(chrom, ds, scale_factor_s, baseline_value=0.0)
Project paired-end fragments into pseudo single-end pileups.
- Parameters:
chrom (bytes) – Chromosome name to pile up.
ds (list[int]) – Fragment lengths used to build the projections.
scale_factor_s (list[float]) – Scale factors paired with each entry in
ds.baseline_value (float, optional) – Minimum value enforced on the coverage array.
- Returns:
Two-element list
[positions, values]representing the merged pileup with the maximum value taken across projections.- Return type:
list
- pileup_bdg(scale_factor=1.0, baseline_value=0.0)
Build a
bedGraphTrackIwith pileups for every chromosome.- Parameters:
scale_factor (float, optional) – Value used to scale the coverage for each chromosome.
baseline_value (float, optional) – Minimum value enforced on the coverage arrays.
- Returns:
BedGraph track populated with per-chromosome pileup data.
- Return type:
- pileup_bdg_hmmr(mapping, baseline_value=0.0)
Generate HMMRATAC-style pileups for every chromosome.
- Parameters:
mapping (list) – Weight mapping produced by HMMRATAC EM training describing the short, mono-, di-, and tri-nucleosomal signals.
baseline_value (float, optional) – Reserved parameter for API compatibility; not currently applied.
- Returns:
List of dictionaries mirroring
mappingwhere each dictionary maps chromosome names to pileup arrays returned bypileup_from_LR_hmmratac().- Return type:
list
- print_to_bed(fhd=None)
Write fragments to a three-column BEDPE-style stream.
- Parameters:
fhd (io.IOBase, optional) – Writable file-like object. Defaults to
sys.stdout.
Notes
Each fragment is emitted as
chrom start endusing decoded chromosome names and the stored integer coordinates.
- rlengths = None
- sample_num(samplesize, seed=-1)
Down-sample fragments in place to approximately
samplesize.- Parameters:
samplesize (int) – Target number of fragments across all chromosomes.
seed (int, optional) – Deterministic seed forwarded to
sample_percent().
Notes
The method converts
samplesizeinto a sampling fraction usingself.total. Ensurefinalize()has been called so counts are up to date.
- sample_num_copy(samplesize, seed=-1)
Return a down-sampled copy with approximately
samplesizefragments.- Parameters:
samplesize (int) – Target number of fragments across all chromosomes.
seed (int, optional) – Deterministic seed forwarded to
sample_percent_copy().
- Returns:
New track containing the sampled fragments.
- Return type:
- sample_percent(percent, seed=-1)
Down-sample fragments in place by a fixed percentage.
- Parameters:
percent (float) – Fraction of fragments to keep per chromosome between 0 and 1 (inclusive).
seed (int, optional) – Deterministic seed for the RNG; a negative value uses NumPy’s global state.
Notes
Sampling is performed independently for each chromosome by shuffling the fragments, resizing the arrays, and restoring coordinate order.
- sample_percent_copy(percent, seed=-1)
Return a down-sampled copy of the track.
- Parameters:
percent (float) – Fraction of fragments to retain per chromosome between 0 and 1.
seed (int, optional) – Deterministic seed used when shuffling; a negative value disables seeding.
- Returns:
New track containing the sampled fragments with metadata copied over.
- Return type:
- set_rlengths(rlengths)
Attach reference chromosome lengths to the track.
- Parameters:
rlengths (dict) – Mapping from chromosome name (bytes) to reference length.
- Returns:
True when the length mapping has been updated.
- Return type:
bool
Notes
Any chromosome stored in the track but missing from
rlengthsis assignedINT_MAXso downstream bounds checks can succeed.
- size = None
- sort()
Sort fragments for each chromosome by genomic coordinate.
Fragments are ordered first by their left coordinate and then by their right coordinate. The
is_sortedflag is set toTruewhen sorting completes.
- total = None
- class MACS3.Signal.PairedEndTrack.PETrackII(anno='', buffer_size=100000)
Bases:
objectPaired-end track for single-cell ATAC fragments with barcode metadata.
Each chromosome stores a structured array of fragment coordinates and counts alongside an integer-encoded barcode array to support barcode-aware analyses.
- add_loc(chromosome, start, end, barcode, count)
Append a fragment together with its barcode and count.
- Parameters:
chromosome (bytes) – Chromosome name (as bytes) for the fragment.
start (int) – Zero-based start coordinate of the fragment.
end (int) – Zero-based end coordinate of the fragment.
barcode (bytes) – Raw barcode sequence associated with the fragment.
count (int) – Number of occurrences represented by the fragment.
Notes
Barcodes are interned into integers via
barcode_dictfor compact storage and the accumulated template length is weighted bycount.
- annotation = None
- average_template_length = None
- barcode_dict = None
-
barcode_last_n:
typedef
- barcodes = None
- buf_size = None
- buffer_size = None
- count_fraglengths()
Count fragment lengths weighted by per-fragment counts.
- Returns:
Mapping from fragment length to the total count contributed by fragments of that length.
- Return type:
dict
- destroy()
Release fragment and barcode arrays held by the track.
All per-chromosome arrays are resized to zero, barcode mappings are cleared, and the track is marked as destroyed.
- exclude(regions)
Remove fragments that overlap the provided exclusion regions.
- Parameters:
regions (MACS3.Signal.Region.Regions) – Sorted region collection whose intervals should be excluded.
Notes
The operation mutates the track in place, adjusts fragment counts and lengths, and finishes by calling
finalize().
- finalize()
Shrink arrays, sort fragments, and refresh aggregate counters.
Each per-chromosome fragment array is resized to its observed length, sorted by
('l', 'r'), and the accompanying barcode array is reordered to match. The method updatestotalandaverage_template_lengthusing count weights and marks the track as sorted.- Raises:
AssertionError – If no fragments are present when finalizing.
- fraglengths()
Return all fragment lengths expanded by their counts.
- Returns:
Array of
end - startvalues repeated according to the stored counts.- Return type:
numpy.ndarray
- get_chr_names()
Return the set of chromosome names stored in the track.
- Returns:
Chromosome names (bytes) that currently have fragments.
- Return type:
set
- get_locations_by_chr(chromosome)
Return the fragment array for a chromosome.
- Parameters:
chromosome (bytes) – Chromosome name, provided as bytes.
- Returns:
Structured array with
('l', 'i4'),('r', 'i4'), and('c', 'u2')fields.- Return type:
numpy.ndarray
- Raises:
Exception – If the chromosome is not present in the track.
- get_rlengths()
Return the reference chromosome lengths associated with the track.
- Returns:
Mapping from chromosome name (bytes) to reference length. Chromosomes without a recorded length default to
INT_MAX.- Return type:
dict
-
is_destroyed:
_fake_callable
- is_sorted = None
- length = None
- locations = None
- pileup_a_chromosome(chrom, scale_factor=1.0, baseline_value=0.0)
Compute a coverage pileup for a single chromosome.
- Parameters:
chrom (bytes) – Chromosome name to pile up.
scale_factor (float, optional) – Value used to scale the resulting coverage.
baseline_value (float, optional) – Minimum value enforced on the coverage array.
- Returns:
Two-element list
[positions, values]with numpy arrays describing the pileup breakpoints and scaled coverage.- Return type:
list
- pileup_a_chromosome_c(chrom, ds, scale_factor_s, baseline_value=0.0)
Project paired-end fragments into pseudo single-end pileups.
- Parameters:
chrom (bytes) – Chromosome name to pile up.
ds (list[int]) – Fragment lengths used to build the projections.
scale_factor_s (list[float]) – Scale factors paired with each entry in
ds.baseline_value (float, optional) – Minimum value enforced on the coverage array.
- Returns:
Two-element list
[positions, values]representing the merged pileup with the maximum value taken across projections.- Return type:
list
- pileup_bdg(scale_factor=1.0, baseline_value=0.0)
Build a
bedGraphTrackIwith pileups for every chromosome.- Parameters:
scale_factor (float, optional) – Value used to scale the coverage for each chromosome.
baseline_value (float, optional) – Minimum value enforced on the coverage arrays.
- Returns:
BedGraph track populated with per-chromosome pileup data.
- Return type:
- pileup_bdg2()
Build a
bedGraphTrackIIwith pileups for every chromosome.- Returns:
BedGraph track populated with per-chromosome pileup arrays and finalized.
- Return type:
- pileup_bdg_hmmr(mapping, baseline_value=0.0)
Generate HMMRATAC-style pileups for every chromosome.
- Parameters:
mapping (list) – Weight mapping produced by HMMRATAC EM training describing the short, mono-, di-, and tri-nucleosomal signals.
baseline_value (float, optional) – Reserved parameter for API compatibility; not currently applied.
- Returns:
List of dictionaries mirroring
mappingwhere each dictionary maps chromosome names to pileup arrays returned bypileup_from_LR_hmmratac().- Return type:
list
- rlengths = None
- sample_num(samplesize, seed=-1)
Down-sample fragments in place so total counts approximate
samplesize.- Parameters:
samplesize (int) – Target total count across all chromosomes.
seed (int, optional) – Deterministic seed forwarded to
sample_percent().
Notes
The method converts
samplesizeinto a sampling fraction using the current total count and reusessample_percent().
- sample_num_copy(samplesize, seed=-1)
Return a down-sampled copy whose total counts approximate
samplesize.- Parameters:
samplesize (int) – Target total count across all chromosomes.
seed (int, optional) – Deterministic seed forwarded to
sample_percent_copy().
- Returns:
New track containing the sampled fragments.
- Return type:
- sample_percent(percent, seed=-1)
Down-sample fragments in place so counts reflect a given percentage.
- Parameters:
percent (float) – Fraction of total counts to keep per chromosome between 0 and 1 (inclusive).
seed (int, optional) – Deterministic seed for the RNG; a negative value uses NumPy’s global state.
Notes
Fragments are sampled proportionally to their counts by expanding to an index vector, shuffling, and collapsing counts for the retained entries. Aggregate statistics are recomputed and the result is resorted.
- sample_percent_copy(percent, seed=-1)
Return a down-sampled copy whose counts reflect a given percentage.
- Parameters:
percent (float) – Fraction of total counts to keep per chromosome between 0 and 1 (inclusive).
seed (int, optional) – Deterministic seed for the RNG; a negative value uses NumPy’s global state.
- Returns:
New track containing the sampled fragments with metadata preserved.
- Return type:
Notes
Fragments are sampled proportionally to their counts and the returned track is sorted with reference lengths copied from the source track.
- set_rlengths(rlengths)
Attach reference chromosome lengths to the track.
- Parameters:
rlengths (dict) – Mapping from chromosome name (bytes) to reference length.
- Returns:
True when the length mapping has been updated.
- Return type:
bool
Notes
Any chromosome stored in the track but missing from
rlengthsis assignedINT_MAXso downstream bounds checks can succeed.
- size = None
- sort()
Sort fragments and barcodes for each chromosome.
Fragments are ordered first by their left coordinate and then by their right coordinate, and the barcode array is reordered alongside the fragment array. The
is_sortedflag is set toTruewhen sorting completes.
- subset(selected_barcodes)
Build a new track containing only fragments from selected barcodes.
- Parameters:
selected_barcodes (set) – Set of barcode byte strings to retain.
- Returns:
New track restricted to the provided barcodes with metadata preserved.
- Return type:
- total = None
- MACS3.Signal.PairedEndTrack.bool(*args, **kwargs)