mastermsm.msm package¶

Submodules¶

mastermsm.msm.msm module¶

This file is part of the MasterMSM package.

class mastermsm.msm.msm.MSM(data=None, keys=None, lagt=None, sym=False)[source]¶

Bases: object

A class for constructing an MSM at a specific lag time.

Variables:	keys (list of str) – Names of states. count (np array) – Transition count matrix. keep_states (list of str) – Names of states after removing not strongly connected sets. sym (bool) – Whether we want to enforce symmetrization.

acf_mode(modes=None)[source]¶

Calculate mode autocorrelation functions.

We use the procedure described by Buchete and Hummer [1]_.

Parameters:	mode (int, list) – Index(es) for sorted autocorrelation functions. time (list, int) – User defined range of temperatures for ACF.
Returns:	corr – The mode(s) autocorrelation function.
Return type:	array

References

[1]	N.-V. Buchete and G. Hummer, “Coarse master equations for peptide folding dynamics”, J. Phys. Chem. B (2008).

boots(nboots=100, nproc=None, sliding=True)[source]¶

Bootstrap the simulation data to calculate errors.

We generate count matrices with the same number of counts as the original by bootstrapping the simulation data. The time series of discrete states are chopped into chunks to form a pool. From the pool we randomly select samples until the number of counts reaches the original.

Parameters:	nboots (int) – Number of bootstrap samples nproc (int) – Number of processors to use

calc_count_multi(sliding=True, nproc=None)[source]¶

Calculate transition count matrix in parallel

Parameters:	sliding (bool) – Whether a sliding window is used in counts. nproc (int) – Number of processors to be used.
Returns:	The count matrix.
Return type:	array

calc_eigsK(evecs=False)[source]¶

Calculate eigenvalues and eigenvectors of rate matrix K

Parameters:	evecs (bool) – Whether we want the eigenvectors of the rate matrix.
Returns:	tauK (numpy array) – Relaxation times from K. peqK (numpy array) – Equilibrium probabilities from K. rvecsK (numpy array, optional) – Right eigenvectors of K, sorted. lvecsK (numpy array, optional) – Left eigenvectors of K, sorted.

calc_eigsT(evecs=False)[source]¶

Calculate eigenvalues and eigenvectors of transition matrix T.

Parameters:	evecs (bool) – Whether we want the eigenvectors of the transition matrix.
Returns:	tauT (numpy array) – Relaxation times from T. peqT (numpy array) – Equilibrium probabilities from T. rvecsT (numpy array, optional) – Right eigenvectors of T, sorted. lvecsT (numpy array, optional) – Left eigenvectors of T, sorted.

check_connect()[source]¶

Check connectivity of rate matrix.

Returns:	keep_states (list) – Indexes of states from the largest strongly connected set. keep_keys (list) – Keys for the largest strongly connected set.

References

We use the Tarjan algorithm as implemented in NetworkX. [1]_

[1]	R. Tarjan, “Depth-first search and linear graph algorithms”, SIAM Journal of Computing (1972).

do_count(sliding=True)[source]¶

Calculates the count matrix.

Parameters:	sliding (bool) – Whether a sliding window is used in counts.

do_pfold(FF=None, UU=None, dot=False)[source]¶

Wrapper to calculate reactive fluxes and committors

We use the Berzhkovskii-Hummer-Szabo method, J. Chem. Phys. (2009)

Parameters:	FF (list) – Folded states. UU (list) – Unfolded states. dot (string) – Filename to output dot graph.

do_rate(method='Taylor', evecs=False, init=False, report=False)[source]¶

Calculates the rate matrix from the transition matrix.

We use a method based on a Taylor expansion [1]_ or the maximum likelihood propagator based (MLPB) method [2].

Parameters:	evecs (bool) – Whether we want the left eigenvectors of the rate matrix. method (str) – Which method one wants to use. Acceptable values are ‘Taylor’ and ‘MLPB’. init (array) – Rate matrix to start optimization from. report (bool) – Whether to report the results from MC in MLPB.

References

[1]	D. De Sancho, J. Mittal and R. B. Best, “Folding kinetics and unfolded state dynamics of the GB1 hairpin from molecular simulation”, J. Chem. Theory Comput. (2013).

[2]	N.-V. Buchete and G. Hummer, “Coarse master equations for peptide folding dynamics”, J. Phys. Chem. B (2008).

do_trans(evecs=False)[source]¶

Wrapper for transition matrix calculation.

Also calculates its eigenvalues and right eigenvectors.

Parameters:	evecs (bool) – Whether we want the left eigenvectors of the transition matrix.

propagateK(p0=None, init=None, time=None)[source]¶

Propagation of rate matrix using matrix exponential

Parameters:

p0 (string) – Filename with initial population.
init (string) – State(s) where the population is initialized.
time (list, array) – User defined range of temperatures for propagating the dynamics.

Returns:

popul (array) – Population of all states as a function of time.
pnorm (array) – Population of all states as a function of time - normalized.

propagateT(p0=None, init=None, time=None)[source]¶

Propagation of transition matrix

Parameters:

p0 (string) – Filename with initial population.
init (string) – State(s) where the population is initialized.
time (list, int) – User defined range of temperatures for propagating the dynamics.

Returns:

popul (array) – Population of all states as a function of time.
pnorm (array) – Population of all states as a function of time - normalized.

Notes

There is probably just one essential difference between propagateT and propagateK. We are obtaining the time evolution of the population towards the equilibrium distribution. Using K, we can obtain the population at any given time (P(t) = exp(Kt)P0), while here we are limited to powers of the transition matrix (hence, P(nt) = [T(t)**n]*P0).

sensitivity(FF=None, UU=None, dot=False)[source]¶

Sensitivity analysis of the states in the network.

We use the procedure described by De Sancho, Kubas, Blumberger and Best [1]_.

Parameters:

FF (list) – Folded states.
UU (list) – Unfolded states.

Returns:

dJ (list) – Derivative of flux.
d_peq (list) – Derivative of equilibrium populations.
d_kf (list) – Derivative of global rate.
kf (float) – Global rate.

References

[1]	De Sancho, A. Kubas, P.-H. Wang, J. Blumberger, R. B.

Best “Identification of mutational hot spots for substrate diffusion: Application to myoglobin”, J. Chem. Theory Comput. (2015).

class mastermsm.msm.msm.SuperMSM(trajs, file_keys=None, keys=None, sym=False)[source]¶

Bases: object

A class for constructing the MSM

Variables:	data (list) – A list of instances of the TimeSeries class keys (list of str) – Names of states. msms (dict) – A dictionary containing MSMs for different lag times. sym (bool) – Whether we want to enforce symmetrization.

ck_test(init=None, sliding=True, error=True, out=None, time=None)[source]¶

Carry out Chapman-Kolmogorov test.

We follow the procedure described by Prinz et al.[1]_

Parameters:	init (str) – The states from which the relaxation should be simulated. sliding (bool) – Whether a sliding window is used to count transitions. error (bool) – Whether errors are calculated.

References

[1]	J.-H. Prinz et al, “Markov models of molecular kinetics: Generation and validation”, J. Chem. Phys. (2011).

convergence_test(save=None, N=1, sliding=True, error=True, time=None, out=None)[source]¶

Carry out convergence test for relaxation times.

Parameters:	N (int) – The number of modes for which we are building the MSM. sliding (bool) – Whether a sliding window should be used or not. error (bool) – Whether to include errors or not. time (array) – The range of times that must be used. save (bool, str) – Whether we want to save to file.

do_lbrate(evecs=False, error=False)[source]¶

Calculates the rate matrix using the lifetime based method.

We use the method described by Buchete and Hummer [1]_.

Parameters:	evecs (bool) – Whether we want the left eigenvectors of the rate matrix. error (bool) – Whether to include errors or not.

References

[1]	N.-V. Buchete and G. Hummer, “Coarse master equations for peptide folding dynamics”, J. Phys. Chem. B (2008).

do_msm(lagt, sliding=True)[source]¶

Construct MSM for specific value of lag time.

Parameters:	lagt (float) – The lag time. sliding (bool) – Whether a sliding window is used in counts.

mastermsm.msm.msm_lib module¶

This file is part of the MasterMSM package.

mastermsm.msm.msm_lib.calc_count_worker(x)[source]¶

mp worker that calculates the count matrix from a trajectory

Parameters:	x (list) – List containing input for each mp worker. Includes: distraj :the time series of states dt : the timestep for that trajectory keys : the keys used in the assignment lagt : the lag time for construction
Returns:	count
Return type:	array

mastermsm.msm.msm_lib.calc_eigsK(rate, evecs=False)[source]¶

Calculate eigenvalues and eigenvectors of rate matrix K

Parameters:

rate (array) – The rate matrix to use.
evecs (bool) – Whether we want the eigenvectors of the rate matrix.
Returns –
------- –
tauK (numpy array) – Relaxation times from K.
peqK (numpy array) – Equilibrium probabilities from K.
rvecsK (numpy array, optional) – Right eigenvectors of K, sorted.
lvecsK (numpy array, optional) – Left eigenvectors of K, sorted.

mastermsm.msm.msm_lib.calc_lifetime(x)[source]¶

mp worker that calculates the count matrix from a trajectory

Parameters:	x (list) – List containing input for each mp worker. Includes: distraj :the time series of states dt : the timestep for that trajectory keys : the keys used in the assignment
Returns:	life
Return type:	dict

mastermsm.msm.msm_lib.calc_mlrate(nkeep, count, lagt, rate_init)[source]¶

Calculate rate matrix using maximum likelihood Bayesian method.

We use a the MLPB method described by Buchete and Hummer.[1]_

Parameters:	nkeep (int) – Number of states in transition matrix. count (np.array) – Transition matrix. lagt (float) – The lag time.
Returns:	rate – The rate matrix.
Return type:	np.array

Notes

..[1] N.-V. Buchete and G. Hummer, “Coarse master equations for: peptide folding dynamics”, J. Phys. Chem. B (2008).

mastermsm.msm.msm_lib.calc_rate(nkeep, trans, lagt)[source]¶

Calculate rate matrix from transition matrix.

We use a method based on a Taylor expansion.[1]_

Parameters:	nkeep (int) – Number of states in transition matrix. trans (np.array) – Transition matrix. lagt (float) – The lag time.
Returns:	rate – The rate matrix.
Return type:	np.array

Notes

..[1] D. De Sancho, J. Mittal and R. B. Best, “Folding kinetics and unfolded state dynamics of the GB1 hairpin from molecular simulation”, J. Chem. Theory Comput. (2013).

mastermsm.msm.msm_lib.calc_trans(nkeep=None, keep_states=None, count=None)[source]¶

Calculates transition matrix.

Uses the maximum likelihood expression by Prinz et al.[1]_

Parameters:	lagt (float) – Lag time for construction of MSM.
Returns:	trans – The transition probability matrix.
Return type:	array

Notes

..[1] J. H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schutte and F. Noe, “Markov state models: Generation and validation”, J. Chem. Phys. (2011).

mastermsm.msm.msm_lib.detailed_balance(nkeep, rate, peq)[source]¶

Enforce detailed balance in rate matrix.

Parameters:	nkeep (int) – The number of states. rate (array) – The rate matrix obeying detailed balance. peq (array) – The equilibrium probability

mastermsm.msm.msm_lib.do_boots_worker(x)[source]¶

Worker function for parallel bootstrapping.

Parameters:	x (list) – A list containing the trajectory filename, the states, the lag time and the total number of transitions.

mastermsm.msm.msm_lib.esort(ei, ej)[source]¶

Sorts eigenvalues.

Parameters:	ei (float) – Eigenvalue i ej (float) – Eigenvalue j
Returns:	Whether the first value is larger than the second.
Return type:	bool

mastermsm.msm.msm_lib.likelihood(nkeep, rate, count, lagt)[source]¶

Likelihood of a rate matrix given a count matrix

We use the procedure described by Buchete and Hummer.[1]_

Parameters:	nkeep (int) – Number of states in transition matrix. count (np.array) – Transition matrix. lagt (float) – The lag time.
Returns:	mlog_like – The log likelihood
Return type:	float

Notes

..[1] N.-V. Buchete and G. Hummer, “Coarse master equations for: peptide folding dynamics”, J. Phys. Chem. B (2008).

mastermsm.msm.msm_lib.mat_mul_v(m, v)[source]¶

Multiplies matrix and vector

Parameters:	m (np.array) – The matrix. v (np.array) – The vector.
Returns:	w – The result
Return type:	np.array

mastermsm.msm.msm_lib.matrix_ave(mat_boots, keep_keys_boots, keys)[source]¶

Return averages from bootstrap results

Parameters:	mat_boots (list) – List of matrix arrays keep_keys_boots (list) – List of key lists keys (list) – List of keys Returns – ------- – mat_ave (array) – Matrix averages mat_std (array) – Matrix std

mastermsm.msm.msm_lib.mc_move(nkeep, rate, peq)[source]¶

Make MC move in either rate or equilibrium probability.

Changes in equilibrium probabilities are introduced so that the new value is drawn from a normal distribution centered at the current value.

Parameters:	nkeep (int) – The number of states. rate (array) – The rate matrix obeying detailed balance. peq (array) – The equilibrium probability

mastermsm.msm.msm_lib.partial_flux(states, peq, K, pfold, d_peq, d_K, d_pfold, target)[source]¶: Calculates derivative of reactive flux

mastermsm.msm.msm_lib.partial_peq(peq, elem)[source]¶

Calculates derivative of equilibrium distribution

Parameters:	peq (np.array) – Equilibrium probabilities.

mastermsm.msm.msm_lib.partial_pfold(states, K, d_K, FF, UU, elem)[source]¶: Calculates derivative of pfold

mastermsm.msm.msm_lib.partial_rate(K, elem)[source]¶

Calculates the derivative of the rate matrix

Parameters:	K (np.array) – The rate matrix. elem (int) – Integer corresponding to which we calculate the partial derivative.
Returns:	d_K – Partial derivative of rate matrix.
Return type:	np.array

mastermsm.msm.msm_lib.peq_averages(peq_boots, keep_keys_boots, keys)[source]¶

Return averages from bootstrap results

Parameters:	peq_boots (list) – List of Peq arrays keep_keys_boots (list) – List of key lists keys (list) – List of keys Returns – ------- – peq_ave (array) – Peq averages peq_std (array) – Peq std

mastermsm.msm.msm_lib.propagateT_worker(x)[source]¶

Propagate dynamics using power of transition matrix

Parameters:	x (list) – Contains T, the power and initial population
Returns:	popul – The propagated population
Return type:	np.array

mastermsm.msm.msm_lib.propagate_worker(x)[source]¶

Propagate dynamics using rate matrix exponential

Parameters:	x (list) – Contains K, the time and the initial population
Returns:	popul – The propagated population
Return type:	np.array

mastermsm.msm.msm_lib.rand_rate(nkeep, count)[source]¶

Randomly generate initial matrix.

Parameters:	nkeep (int) – Number of states in transition matrix. count (np.array) – Transition matrix.
Returns:	rand_rate – The random rate matrix.
Return type:	np.array

mastermsm.msm.msm_lib.run_commit(states, K, peq, FF, UU)[source]¶

Calculate committors and reactive flux

Parameters:

states (list) – States in the MSM.
K (np.array) – Rate matrix.
peq (np.array) – Equilibrium distribution.
FF (list) – Definitely folded states.
UU (list) – Definitely unfolded states.

Returns:

J (np.array) – Reactive flux matrix.
pfold (np.array) – Values of the committor.
sum_flux (float) – Sum of reactive fluxes.
kf (float) – Folding rate from flux over population relationship.

mastermsm.msm.msm_lib.tau_averages(tau_boots, keys)[source]¶

Return averages from bootstrap results

Parameters:	tau_boots (list) – List of Tau arrays Returns – ------- – tau_ave (array) – Tau averages tau_std (array) – Tau std

mastermsm.msm.msm_lib.traj_split(data=None, lagt=None, fdboots=None)[source]¶

Splits trajectories into fragments for bootstrapping

Parameters:	data (list) – Set of trajectories used for building the MSM. lagt (float) – Lag time for building the MSM. Returns – ------- – filetmp (file object) – Open file object with trajectory fragments.

mastermsm.msm package¶

Submodules¶

mastermsm.msm.msm module¶

mastermsm.msm.msm_lib module¶

Module contents¶

MasterMSM

Navigation

Related Topics