hmmratac
HMM model json file format
The HMM trained from hmmratac
can be saved in a JSON format file and
be loaded later. This option enables users to reuse an ideal hidden
markove model, that captures the signals and relationships among open
chromatin regions, nucleosomes, and backgrounds, from a good quality
data on other ATAC-seq dataset. The JSON data from hmmratac
is a
JSON dictionary containing:
hmm_type
- either ‘gaussian’ or ‘poisson’ for the emission modelhmm_binsize
- the bin size in basepair, used to sample the signals across the genome.n_features
- this is fixed at 4. Inhmmratac
, the features used to train the HMM is the short fragment, the mono-nucleosomal, the di-nucleosomal, and the tri-nucleosomal signals.i_open_region
/i_nucleosomal_region
/i_background_region
- index number of the three states, including the open region, the nucleosomal region and the background region, in the emission and transition matrix data, starting from 0.startprob
- a list of the initial probabilities of the HMM states at the first bin of a candidate region for decoding. Ideally, the first bin should be more likely a state for the background region, and less likely a state for the open region. Check the index numbers ini_open_region
,i_nucleosomal_region
, ori_background_region
to figure out which value corresponds to which state.transmat
- the transition matrix (list of lists) indicating the probabilities that a state can transit to another/or the same state. It will be always a 3x3 matrix. If you want to figure the transition probability from the state A to the state B, you need to find the index number of state A and B, then identify thei_state_A
list, then thei_state_B
number.lambda
- this is only available if the model is ‘poisson’, containing the lambda values of Possion models. This represents the emission model of each of the three states, so it’s a 3(states)x4(features) matrix. Check the index numbers ini_open_region
,i_nucleosomal_region
, ori_background_region
to figure out which list corresponds to which state.covariance_type
- this is only available while the model type is
‘gaussian’, and is always ‘full’ currently. ‘full’ means each state
uses a full covariance matrix.mean
andcovars
- these are only available if the model is ‘gaussian’. If the emission follows Gaussian model, each state will have a mean value and a 4x4 matrix for the full covariance matrix. Check the index numbers ini_open_region
,i_nucleosomal_region
, ori_background_region
to figure out which mean value or 4x4 matrix correspond to which state.