MACS3.IO.BAM module

Utilities for reading BAM files and their BAI indexes in MACS3.

This code is free software; you can redistribute it and/or modify it under the terms of the BSD License (see the file LICENSE included with the distribution).

class MACS3.IO.BAM.BAIFile

Bases: object

In-memory representation of a BAM index (BAI) file.

filename

Path to the .bai file.

magic

File magic header (should be b"BAI\1").

n_ref

Number of reference sequences.

metadata

metadata for reference sequences.

n_bins

Total number of bins across references.

n_chunks

Total number of chunks across references.

n_mapped

Total mapped reads.

n_unmapped

Total unmapped reads.

bins

list of bin Ids.

get_chunks_by_bin(ref_n, bin_n)

Return sorted BGZF chunks for bin_n on reference ref_n.

Parameters:
  • ref_n (cython.uint) – Reference index in the BAI.

  • bin_n (cython.uint) – Bin identifier.

Returns:

Sorted list of BGZF chunks for the bin.

Return type:

list

Examples

from MACS3.IO.BAM import BAIFile
bai = BAIFile("example.bam.bai")
bai.open()
bai.read()
chunks = bai.get_chunks_by_bin(ref_n=0, bin_n=4681)
get_chunks_by_list_of_bins(ref_n, bins)

Return sorted chunks for the unique set of bins provided.

Parameters:
  • ref_n (cython.uint) – Reference index in the BAI.

  • bins (list) – list of bin identifiers.

Returns:

sorted list of BGZF chunks for the bins.

Return type:

list

Examples

from MACS3.IO.BAM import BAIFile
bai = BAIFile("example.bam.bai")
bai.open()
bai.read()
bins = [4681, 4682, 585]
chunks = bai.get_chunks_by_list_of_bins(ref_n=0, bins=bins)
get_chunks_by_list_of_regions(ref_n, regions)

Return BGZF chunks overlapping any region in regions.

Parameters:
  • ref_n (cython.uint) – Reference index in the BAI.

  • regions (list) – Iterable of (beg, end) tuples.

Returns:

Sorted list of BGZF chunk tuples covering the regions.

Return type:

list

Examples

from MACS3.IO.BAM import BAIFile
bai = BAIFile("example.bam.bai")
bai.open()
bai.read()
regions = [(1_000, 2_000), (50_000, 55_000)]
chunks = bai.get_chunks_by_list_of_regions(ref_n=0, regions=regions)
get_chunks_by_region(ref_n, beg, end)

Return BGZF chunks overlapping [beg, end) on reference ref_n.

Parameters:
  • ref_n (cython.uint) – Reference index in the BAI.

  • beg (cython.uint) – start coordinate.

  • end (cython.uint) – end coordinate.

Returns:

Sorted list of BGZF chunks covering the region.

Return type:

list

Examples

from MACS3.IO.BAM import BAIFile
bai = BAIFile("example.bam.bai")
bai.open()
bai.read()
chunks = bai.get_chunks_by_region(ref_n=0, beg=1_000_000, end=1_010_000)
get_coffset_by_region(ref_n, beg, end)

Return the BGZF compressed offset for the leftmost overlapping block.

Parameters:
  • ref_n (cython.uint) – Reference index in the BAI.

  • beg (cython.uint) – start coordinate.

  • end (cython.uint) – end coordinate.

Returns:

Compressed BGZF block offset, or 0 if no chunks overlap.

Return type:

int

Examples

from MACS3.IO.BAM import BAIFile
bai = BAIFile("example.bam.bai")
bai.open()
bai.read()
coffset = bai.get_coffset_by_region(ref_n=0, beg=1_000_000, end=1_010_000)
get_coffsets_by_list_of_regions(ref_n, regions)

Return compressed offsets for the leftmost block of each region.

Parameters:
  • ref_n (cython.uint) – Reference index in the BAI.

  • regions (list) – list of regions.

Returns:

Compressed offsets for each region, in input order.

Return type:

list

Examples

from MACS3.IO.BAM import BAIFile
bai = BAIFile("example.bam.bai")
bai.open()
bai.read()
regions = [(1_000, 2_000), (50_000, 55_000)]
coffsets = bai.get_coffsets_by_list_of_regions(ref_n=0, regions=regions)
get_metadata_by_refseq(ref_n)

Return pseudo-bin metadata for reference ref_n.

Parameters:

ref_n (cython.uint) – Reference index in the BAI.

Returns:

Metadata for the reference.

Return type:

dict

Examples

from MACS3.IO.BAM import BAIFile
bai = BAIFile("example.bam.bai")
bai.open()
bai.read()
meta = bai.get_metadata_by_refseq(ref_n=0)
class MACS3.IO.BAM.BAMaccessor

Bases: object

Random-access BAM reader backed by a matching BAI index.

The accessor reads headers via gzip for compatibility, but seeks directly to BGZF blocks when fetching alignments for specific regions.

bam_filename

Path to the BAM file.

bai_filename

Path to .bai file.

bamfile

BAM file handler “rb” mode.

baifile

BAI file handler.

references

Reference/chromosome names in BAM order.

rlengths

Lengths of reference/chromosomes.

bgzf_block_cache

Cache of decompressed bgzf_block.

coffset_cache

coffset of the cached bgzf_block.

noffset_cache

coffset of the next block of the cached bgzf_block.

close()

Close the underlying BAM stream.

Returns:

None

get_chromosomes()

Return reference names in header order.

Returns:

Reference/chromosome.

Return type:

list

get_reads_in_region(chrom, left, right, maxDuplicate=1)

Return alignments overlapping [left, right) on chrom.

Parameters:
  • chrom (bytes) – Chromosome name matching the BAM header.

  • left (cython.int) – 0-based inclusive start coordinate.

  • right (cython.int) – 0-based exclusive end coordinate.

  • maxDuplicate (cython.int) – Maximum number of identical alignments to retain.

Returns:

MACS3.Signal.ReadAlignment.ReadAlignment objects.

Return type:

list

get_rlengths()

Return reference lengths keyed by reference name.

Returns:

Mapping of reference name to length.

Return type:

dict