MACS3.IO.BAM module
Utilities for reading BAM files and their BAI indexes in MACS3.
This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).
- class MACS3.IO.BAM.BAIFile
Bases:
objectIn-memory representation of a BAM index (BAI) file.
- filename
Path to the
.baifile.
- magic
File magic header (should be
b"BAI\1").
- n_ref
Number of reference sequences.
- metadata
metadata for reference sequences.
- n_bins
Total number of bins across references.
- n_chunks
Total number of chunks across references.
- n_mapped
Total mapped reads.
- n_unmapped
Total unmapped reads.
- bins
list of bin Ids.
- get_chunks_by_bin(ref_n, bin_n)
Return sorted BGZF chunks for
bin_non referenceref_n.- Parameters:
ref_n (cython.uint) – Reference index in the BAI.
bin_n (cython.uint) – Bin identifier.
- Returns:
Sorted list of BGZF chunks for the bin.
- Return type:
list
Examples
from MACS3.IO.BAM import BAIFile bai = BAIFile("example.bam.bai") bai.open() bai.read() chunks = bai.get_chunks_by_bin(ref_n=0, bin_n=4681)
- get_chunks_by_list_of_bins(ref_n, bins)
Return sorted chunks for the unique set of bins provided.
- Parameters:
ref_n (cython.uint) – Reference index in the BAI.
bins (list) – list of bin identifiers.
- Returns:
sorted list of BGZF chunks for the bins.
- Return type:
list
Examples
from MACS3.IO.BAM import BAIFile bai = BAIFile("example.bam.bai") bai.open() bai.read() bins = [4681, 4682, 585] chunks = bai.get_chunks_by_list_of_bins(ref_n=0, bins=bins)
- get_chunks_by_list_of_regions(ref_n, regions)
Return BGZF chunks overlapping any region in
regions.- Parameters:
ref_n (cython.uint) – Reference index in the BAI.
regions (list) – Iterable of
(beg, end)tuples.
- Returns:
Sorted list of BGZF chunk tuples covering the regions.
- Return type:
list
Examples
from MACS3.IO.BAM import BAIFile bai = BAIFile("example.bam.bai") bai.open() bai.read() regions = [(1_000, 2_000), (50_000, 55_000)] chunks = bai.get_chunks_by_list_of_regions(ref_n=0, regions=regions)
- get_chunks_by_region(ref_n, beg, end)
Return BGZF chunks overlapping
[beg, end)on referenceref_n.- Parameters:
ref_n (cython.uint) – Reference index in the BAI.
beg (cython.uint) – start coordinate.
end (cython.uint) – end coordinate.
- Returns:
Sorted list of BGZF chunks covering the region.
- Return type:
list
Examples
from MACS3.IO.BAM import BAIFile bai = BAIFile("example.bam.bai") bai.open() bai.read() chunks = bai.get_chunks_by_region(ref_n=0, beg=1_000_000, end=1_010_000)
- get_coffset_by_region(ref_n, beg, end)
Return the BGZF compressed offset for the leftmost overlapping block.
- Parameters:
ref_n (cython.uint) – Reference index in the BAI.
beg (cython.uint) – start coordinate.
end (cython.uint) – end coordinate.
- Returns:
Compressed BGZF block offset, or 0 if no chunks overlap.
- Return type:
int
Examples
from MACS3.IO.BAM import BAIFile bai = BAIFile("example.bam.bai") bai.open() bai.read() coffset = bai.get_coffset_by_region(ref_n=0, beg=1_000_000, end=1_010_000)
- get_coffsets_by_list_of_regions(ref_n, regions)
Return compressed offsets for the leftmost block of each region.
- Parameters:
ref_n (cython.uint) – Reference index in the BAI.
regions (list) – list of regions.
- Returns:
Compressed offsets for each region, in input order.
- Return type:
list
Examples
from MACS3.IO.BAM import BAIFile bai = BAIFile("example.bam.bai") bai.open() bai.read() regions = [(1_000, 2_000), (50_000, 55_000)] coffsets = bai.get_coffsets_by_list_of_regions(ref_n=0, regions=regions)
- get_metadata_by_refseq(ref_n)
Return pseudo-bin metadata for reference
ref_n.- Parameters:
ref_n (cython.uint) – Reference index in the BAI.
- Returns:
Metadata for the reference.
- Return type:
dict
Examples
from MACS3.IO.BAM import BAIFile bai = BAIFile("example.bam.bai") bai.open() bai.read() meta = bai.get_metadata_by_refseq(ref_n=0)
- class MACS3.IO.BAM.BAMaccessor
Bases:
objectRandom-access BAM reader backed by a matching BAI index.
The accessor reads headers via gzip for compatibility, but seeks directly to BGZF blocks when fetching alignments for specific regions.
- bam_filename
Path to the BAM file.
- bai_filename
Path to
.baifile.
- bamfile
BAM file handler “rb” mode.
- baifile
BAI file handler.
- references
Reference/chromosome names in BAM order.
- rlengths
Lengths of reference/chromosomes.
- bgzf_block_cache
Cache of decompressed bgzf_block.
- coffset_cache
coffset of the cached bgzf_block.
- noffset_cache
coffset of the next block of the cached bgzf_block.
- close()
Close the underlying BAM stream.
- Returns:
None
- get_chromosomes()
Return reference names in header order.
- Returns:
Reference/chromosome.
- Return type:
list
- get_reads_in_region(chrom, left, right, maxDuplicate=1)
Return alignments overlapping
[left, right)onchrom.- Parameters:
chrom (bytes) – Chromosome name matching the BAM header.
left (cython.int) – 0-based inclusive start coordinate.
right (cython.int) – 0-based exclusive end coordinate.
maxDuplicate (cython.int) – Maximum number of identical alignments to retain.
- Returns:
- Return type:
list
- get_rlengths()
Return reference lengths keyed by reference name.
- Returns:
Mapping of reference name to length.
- Return type:
dict