mastermsm.msm package¶
Submodules¶
mastermsm.msm.msm module¶
This file is part of the MasterMSM package.
-
class
mastermsm.msm.msm.
MSM
(data=None, keys=None, lagt=None, sym=False)[source]¶ Bases:
object
A class for constructing an MSM at a specific lag time.
Variables: - keys (list of str) – Names of states.
- count (np array) – Transition count matrix.
- keep_states (list of str) – Names of states after removing not strongly connected sets.
- sym (bool) – Whether we want to enforce symmetrization.
-
acf_mode
(modes=None)[source]¶ Calculate mode autocorrelation functions.
We use the procedure described by Buchete and Hummer [1]_.
Parameters: - mode (int, list) – Index(es) for sorted autocorrelation functions.
- time (list, int) – User defined range of temperatures for ACF.
Returns: corr – The mode(s) autocorrelation function.
Return type: array
References
[1] N.-V. Buchete and G. Hummer, “Coarse master equations for peptide folding dynamics”, J. Phys. Chem. B (2008).
-
boots
(nboots=100, nproc=None, sliding=True)[source]¶ Bootstrap the simulation data to calculate errors.
We generate count matrices with the same number of counts as the original by bootstrapping the simulation data. The time series of discrete states are chopped into chunks to form a pool. From the pool we randomly select samples until the number of counts reaches the original.
Parameters: - nboots (int) – Number of bootstrap samples
- nproc (int) – Number of processors to use
-
calc_count_multi
(sliding=True, nproc=None)[source]¶ Calculate transition count matrix in parallel
Parameters: - sliding (bool) – Whether a sliding window is used in counts.
- nproc (int) – Number of processors to be used.
Returns: The count matrix.
Return type: array
-
calc_eigsK
(evecs=False)[source]¶ Calculate eigenvalues and eigenvectors of rate matrix K
Parameters: evecs (bool) – Whether we want the eigenvectors of the rate matrix. Returns: - tauK (numpy array) – Relaxation times from K.
- peqK (numpy array) – Equilibrium probabilities from K.
- rvecsK (numpy array, optional) – Right eigenvectors of K, sorted.
- lvecsK (numpy array, optional) – Left eigenvectors of K, sorted.
-
calc_eigsT
(evecs=False)[source]¶ Calculate eigenvalues and eigenvectors of transition matrix T.
Parameters: evecs (bool) – Whether we want the eigenvectors of the transition matrix. Returns: - tauT (numpy array) – Relaxation times from T.
- peqT (numpy array) – Equilibrium probabilities from T.
- rvecsT (numpy array, optional) – Right eigenvectors of T, sorted.
- lvecsT (numpy array, optional) – Left eigenvectors of T, sorted.
-
check_connect
()[source]¶ Check connectivity of rate matrix.
Returns: - keep_states (list) – Indexes of states from the largest strongly connected set.
- keep_keys (list) – Keys for the largest strongly connected set.
References
We use the Tarjan algorithm as implemented in NetworkX. [1]_
[1] R. Tarjan, “Depth-first search and linear graph algorithms”, SIAM Journal of Computing (1972).
-
do_count
(sliding=True)[source]¶ Calculates the count matrix.
Parameters: sliding (bool) – Whether a sliding window is used in counts.
-
do_pfold
(FF=None, UU=None, dot=False)[source]¶ Wrapper to calculate reactive fluxes and committors
We use the Berzhkovskii-Hummer-Szabo method, J. Chem. Phys. (2009)
Parameters: - FF (list) – Folded states.
- UU (list) – Unfolded states.
- dot (string) – Filename to output dot graph.
-
do_rate
(method='Taylor', evecs=False, init=False, report=False)[source]¶ Calculates the rate matrix from the transition matrix.
We use a method based on a Taylor expansion [1]_ or the maximum likelihood propagator based (MLPB) method [2].
Parameters: - evecs (bool) – Whether we want the left eigenvectors of the rate matrix.
- method (str) – Which method one wants to use. Acceptable values are ‘Taylor’ and ‘MLPB’.
- init (array) – Rate matrix to start optimization from.
- report (bool) – Whether to report the results from MC in MLPB.
References
[1] D. De Sancho, J. Mittal and R. B. Best, “Folding kinetics and unfolded state dynamics of the GB1 hairpin from molecular simulation”, J. Chem. Theory Comput. (2013). [2] N.-V. Buchete and G. Hummer, “Coarse master equations for peptide folding dynamics”, J. Phys. Chem. B (2008).
-
do_trans
(evecs=False)[source]¶ Wrapper for transition matrix calculation.
Also calculates its eigenvalues and right eigenvectors.
Parameters: evecs (bool) – Whether we want the left eigenvectors of the transition matrix.
-
propagateK
(p0=None, init=None, time=None)[source]¶ Propagation of rate matrix using matrix exponential
Parameters: - p0 (string) – Filename with initial population.
- init (string) – State(s) where the population is initialized.
- time (list, array) – User defined range of temperatures for propagating the dynamics.
Returns: - popul (array) – Population of all states as a function of time.
- pnorm (array) – Population of all states as a function of time - normalized.
-
propagateT
(p0=None, init=None, time=None)[source]¶ Propagation of transition matrix
Parameters: - p0 (string) – Filename with initial population.
- init (string) – State(s) where the population is initialized.
- time (list, int) – User defined range of temperatures for propagating the dynamics.
Returns: - popul (array) – Population of all states as a function of time.
- pnorm (array) – Population of all states as a function of time - normalized.
Notes
There is probably just one essential difference between propagateT and propagateK. We are obtaining the time evolution of the population towards the equilibrium distribution. Using K, we can obtain the population at any given time (P(t) = exp(Kt)P0), while here we are limited to powers of the transition matrix (hence, P(nt) = [T(t)**n]*P0).
-
sensitivity
(FF=None, UU=None, dot=False)[source]¶ Sensitivity analysis of the states in the network.
We use the procedure described by De Sancho, Kubas, Blumberger and Best [1]_.
Parameters: - FF (list) – Folded states.
- UU (list) – Unfolded states.
Returns: - dJ (list) – Derivative of flux.
- d_peq (list) – Derivative of equilibrium populations.
- d_kf (list) – Derivative of global rate.
- kf (float) – Global rate.
References
[1] - De Sancho, A. Kubas, P.-H. Wang, J. Blumberger, R. B.
Best “Identification of mutational hot spots for substrate diffusion: Application to myoglobin”, J. Chem. Theory Comput. (2015).
-
class
mastermsm.msm.msm.
SuperMSM
(trajs, file_keys=None, keys=None, sym=False)[source]¶ Bases:
object
A class for constructing the MSM
Variables: - data (list) – A list of instances of the TimeSeries class
- keys (list of str) – Names of states.
- msms (dict) – A dictionary containing MSMs for different lag times.
- sym (bool) – Whether we want to enforce symmetrization.
-
ck_test
(init=None, sliding=True, error=True, out=None, time=None)[source]¶ Carry out Chapman-Kolmogorov test.
We follow the procedure described by Prinz et al.[1]_
Parameters: - init (str) – The states from which the relaxation should be simulated.
- sliding (bool) – Whether a sliding window is used to count transitions.
- error (bool) – Whether errors are calculated.
References
[1] J.-H. Prinz et al, “Markov models of molecular kinetics: Generation and validation”, J. Chem. Phys. (2011).
-
convergence_test
(save=None, N=1, sliding=True, error=True, time=None, out=None)[source]¶ Carry out convergence test for relaxation times.
Parameters: - N (int) – The number of modes for which we are building the MSM.
- sliding (bool) – Whether a sliding window should be used or not.
- error (bool) – Whether to include errors or not.
- time (array) – The range of times that must be used.
- save (bool, str) – Whether we want to save to file.
-
do_lbrate
(evecs=False, error=False)[source]¶ Calculates the rate matrix using the lifetime based method.
We use the method described by Buchete and Hummer [1]_.
Parameters: - evecs (bool) – Whether we want the left eigenvectors of the rate matrix.
- error (bool) – Whether to include errors or not.
References
[1] N.-V. Buchete and G. Hummer, “Coarse master equations for peptide folding dynamics”, J. Phys. Chem. B (2008).
mastermsm.msm.msm_lib module¶
This file is part of the MasterMSM package.
-
mastermsm.msm.msm_lib.
calc_count_worker
(x)[source]¶ mp worker that calculates the count matrix from a trajectory
Parameters: x (list) – List containing input for each mp worker. Includes: distraj :the time series of states dt : the timestep for that trajectory keys : the keys used in the assignment lagt : the lag time for construction Returns: count Return type: array
-
mastermsm.msm.msm_lib.
calc_eigsK
(rate, evecs=False)[source]¶ Calculate eigenvalues and eigenvectors of rate matrix K
Parameters: - rate (array) – The rate matrix to use.
- evecs (bool) – Whether we want the eigenvectors of the rate matrix.
- Returns –
- ------- –
- tauK (numpy array) – Relaxation times from K.
- peqK (numpy array) – Equilibrium probabilities from K.
- rvecsK (numpy array, optional) – Right eigenvectors of K, sorted.
- lvecsK (numpy array, optional) – Left eigenvectors of K, sorted.
-
mastermsm.msm.msm_lib.
calc_lifetime
(x)[source]¶ mp worker that calculates the count matrix from a trajectory
Parameters: x (list) – List containing input for each mp worker. Includes: distraj :the time series of states dt : the timestep for that trajectory keys : the keys used in the assignment Returns: life Return type: dict
-
mastermsm.msm.msm_lib.
calc_mlrate
(nkeep, count, lagt, rate_init)[source]¶ Calculate rate matrix using maximum likelihood Bayesian method.
We use a the MLPB method described by Buchete and Hummer.[1]_
Parameters: - nkeep (int) – Number of states in transition matrix.
- count (np.array) – Transition matrix.
- lagt (float) – The lag time.
Returns: rate – The rate matrix.
Return type: np.array
Notes
- ..[1] N.-V. Buchete and G. Hummer, “Coarse master equations for
- peptide folding dynamics”, J. Phys. Chem. B (2008).
-
mastermsm.msm.msm_lib.
calc_rate
(nkeep, trans, lagt)[source]¶ Calculate rate matrix from transition matrix.
We use a method based on a Taylor expansion.[1]_
Parameters: - nkeep (int) – Number of states in transition matrix.
- trans (np.array) – Transition matrix.
- lagt (float) – The lag time.
Returns: rate – The rate matrix.
Return type: np.array
Notes
..[1] D. De Sancho, J. Mittal and R. B. Best, “Folding kinetics and unfolded state dynamics of the GB1 hairpin from molecular simulation”, J. Chem. Theory Comput. (2013).
-
mastermsm.msm.msm_lib.
calc_trans
(nkeep=None, keep_states=None, count=None)[source]¶ Calculates transition matrix.
Uses the maximum likelihood expression by Prinz et al.[1]_
Parameters: lagt (float) – Lag time for construction of MSM. Returns: trans – The transition probability matrix. Return type: array Notes
..[1] J. H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schutte and F. Noe, “Markov state models: Generation and validation”, J. Chem. Phys. (2011).
-
mastermsm.msm.msm_lib.
detailed_balance
(nkeep, rate, peq)[source]¶ Enforce detailed balance in rate matrix.
Parameters: - nkeep (int) – The number of states.
- rate (array) – The rate matrix obeying detailed balance.
- peq (array) – The equilibrium probability
-
mastermsm.msm.msm_lib.
do_boots_worker
(x)[source]¶ Worker function for parallel bootstrapping.
Parameters: x (list) – A list containing the trajectory filename, the states, the lag time and the total number of transitions.
-
mastermsm.msm.msm_lib.
esort
(ei, ej)[source]¶ Sorts eigenvalues.
Parameters: - ei (float) – Eigenvalue i
- ej (float) – Eigenvalue j
Returns: Whether the first value is larger than the second.
Return type: bool
-
mastermsm.msm.msm_lib.
likelihood
(nkeep, rate, count, lagt)[source]¶ Likelihood of a rate matrix given a count matrix
We use the procedure described by Buchete and Hummer.[1]_
Parameters: - nkeep (int) – Number of states in transition matrix.
- count (np.array) – Transition matrix.
- lagt (float) – The lag time.
Returns: mlog_like – The log likelihood
Return type: float
Notes
- ..[1] N.-V. Buchete and G. Hummer, “Coarse master equations for
- peptide folding dynamics”, J. Phys. Chem. B (2008).
-
mastermsm.msm.msm_lib.
mat_mul_v
(m, v)[source]¶ Multiplies matrix and vector
Parameters: - m (np.array) – The matrix.
- v (np.array) – The vector.
Returns: w – The result
Return type: np.array
-
mastermsm.msm.msm_lib.
matrix_ave
(mat_boots, keep_keys_boots, keys)[source]¶ Return averages from bootstrap results
Parameters: - mat_boots (list) – List of matrix arrays
- keep_keys_boots (list) – List of key lists
- keys (list) – List of keys
- Returns –
- ------- –
- mat_ave (array) – Matrix averages
- mat_std (array) – Matrix std
-
mastermsm.msm.msm_lib.
mc_move
(nkeep, rate, peq)[source]¶ Make MC move in either rate or equilibrium probability.
Changes in equilibrium probabilities are introduced so that the new value is drawn from a normal distribution centered at the current value.
Parameters: - nkeep (int) – The number of states.
- rate (array) – The rate matrix obeying detailed balance.
- peq (array) – The equilibrium probability
-
mastermsm.msm.msm_lib.
partial_flux
(states, peq, K, pfold, d_peq, d_K, d_pfold, target)[source]¶ Calculates derivative of reactive flux
-
mastermsm.msm.msm_lib.
partial_peq
(peq, elem)[source]¶ Calculates derivative of equilibrium distribution
Parameters: peq (np.array) – Equilibrium probabilities.
-
mastermsm.msm.msm_lib.
partial_pfold
(states, K, d_K, FF, UU, elem)[source]¶ Calculates derivative of pfold
-
mastermsm.msm.msm_lib.
partial_rate
(K, elem)[source]¶ Calculates the derivative of the rate matrix
Parameters: - K (np.array) – The rate matrix.
- elem (int) – Integer corresponding to which we calculate the partial derivative.
Returns: d_K – Partial derivative of rate matrix.
Return type: np.array
-
mastermsm.msm.msm_lib.
peq_averages
(peq_boots, keep_keys_boots, keys)[source]¶ Return averages from bootstrap results
Parameters: - peq_boots (list) – List of Peq arrays
- keep_keys_boots (list) – List of key lists
- keys (list) – List of keys
- Returns –
- ------- –
- peq_ave (array) – Peq averages
- peq_std (array) – Peq std
-
mastermsm.msm.msm_lib.
propagateT_worker
(x)[source]¶ Propagate dynamics using power of transition matrix
Parameters: x (list) – Contains T, the power and initial population Returns: popul – The propagated population Return type: np.array
-
mastermsm.msm.msm_lib.
propagate_worker
(x)[source]¶ Propagate dynamics using rate matrix exponential
Parameters: x (list) – Contains K, the time and the initial population Returns: popul – The propagated population Return type: np.array
-
mastermsm.msm.msm_lib.
rand_rate
(nkeep, count)[source]¶ Randomly generate initial matrix.
Parameters: - nkeep (int) – Number of states in transition matrix.
- count (np.array) – Transition matrix.
Returns: rand_rate – The random rate matrix.
Return type: np.array
-
mastermsm.msm.msm_lib.
run_commit
(states, K, peq, FF, UU)[source]¶ Calculate committors and reactive flux
Parameters: - states (list) – States in the MSM.
- K (np.array) – Rate matrix.
- peq (np.array) – Equilibrium distribution.
- FF (list) – Definitely folded states.
- UU (list) – Definitely unfolded states.
Returns: - J (np.array) – Reactive flux matrix.
- pfold (np.array) – Values of the committor.
- sum_flux (float) – Sum of reactive fluxes.
- kf (float) – Folding rate from flux over population relationship.
-
mastermsm.msm.msm_lib.
tau_averages
(tau_boots, keys)[source]¶ Return averages from bootstrap results
Parameters: - tau_boots (list) – List of Tau arrays
- Returns –
- ------- –
- tau_ave (array) – Tau averages
- tau_std (array) – Tau std
-
mastermsm.msm.msm_lib.
traj_split
(data=None, lagt=None, fdboots=None)[source]¶ Splits trajectories into fragments for bootstrapping
Parameters: - data (list) – Set of trajectories used for building the MSM.
- lagt (float) – Lag time for building the MSM.
- Returns –
- ------- –
- filetmp (file object) – Open file object with trajectory fragments.