# `hmmratac` HMM model json file format

The HMM trained from `hmmratac` can be saved in a JSON format file and
be loaded later. This option enables users to reuse an ideal hidden
markove model, that captures the signals and relationships among open
chromatin regions, nucleosomes, and backgrounds, from a good quality
data on other ATAC-seq dataset. The JSON data from `hmmratac` is a
JSON dictionary containing:

1. `hmm_type` - either 'gaussian' or 'poisson' for the emission model
2. `hmm_binsize` - the bin size in basepair, used to sample the signals
   across the genome. 
3. `n_features` - this is fixed at 4. In `hmmratac`, the features used
   to train the HMM is the short fragment, the mono-nucleosomal, the
   di-nucleosomal, and the tri-nucleosomal signals.
4. `i_open_region`/`i_nucleosomal_region`/`i_background_region` -
   index number of the three states, including the open region, the
   nucleosomal region and the background region, in the emission and
   transition matrix data, starting from 0.
5. `startprob` - a list of the initial probabilities of the HMM states
   at the first bin of a candidate region for decoding. Ideally, the
   first bin should be more likely a state for the background region,
   and less likely a state for the open region. Check the index
   numbers in `i_open_region`, `i_nucleosomal_region`, or
   `i_background_region` to figure out which value corresponds to
   which state.
6. `transmat` - the transition matrix (list of lists) indicating the
   probabilities that a state can transit to another/or the same
   state. It will be always a 3x3 matrix. If you want to figure the
   transition probability from the state A to the state B, you need to
   find the index number of state A and B, then identify the
   `i_state_A` list, then the `i_state_B` number.
7. `lambda` - this is only available if the model is 'poisson',
   containing the lambda values of Possion models. This represents the
   emission model of each of the three states, so it's a
   3(states)x4(features) matrix.  Check the index numbers in
   `i_open_region`, `i_nucleosomal_region`, or `i_background_region`
   to figure out which list corresponds to which state.
8. `covariance_type` - this is only available while the model type is  
   'gaussian', and is always 'full' currently. 'full' means each state  
   uses a full covariance matrix. 
9. `mean` and `covars`- these are only available if the model is
   'gaussian'. If the emission follows Gaussian model, each state will
   have a mean value and a 4x4 matrix for the full covariance matrix.
   Check the index numbers in `i_open_region`, `i_nucleosomal_region`,
   or `i_background_region` to figure out which mean value or 4x4
   matrix correspond to which state.