remain_prob

mdtools.dtrj.remain_prob(dtrj, restart=1, continuous=False, discard_neg_start=False, discard_all_neg=False, verbose=False)[source]

Calculate the probability that a compound is still (or again) in the same state as at time \(t_0\) after a lag time \(\Delta t\).

Take a discrete trajectory and calculate the probability to find a compound in the same state as at time \(t_0\) after a lag time \(\Delta t\).

Important

\(t_0\) is an arbitrary time point in the trajectory. It is particularly not the time of a state transition!

Parameters:
  • dtrj (array_like) – The discrete trajectory. Array of shape (n, f), where n is the number of compounds and f is the number of frames. The shape can also be (f,), in which case the array is expanded to shape (1, f). The elements of dtrj are interpreted as the indices of the states in which a given compound is at a given frame.

  • restart (int, optional) – Restart every restart frames. Determines how many restarting points \(t_0\) are used for averaging.

  • continuous (bool, optional) – If True, compounds must continuously be in the same state without interruption in order to be counted (see notes).

  • discard_neg_start (bool, optional) – If True, discard all transitions starting from a negative state (see notes). This is equivalent to discarding the lifetimes of all negative states when calculating state lifetimes with mdtools.dtrj.lifetimes(). Must not be used together with discard_all_neg.

  • discard_all_neg (bool, optional) – If True, discard all transitions starting from or ending in a negative state (see notes). This is equivalent to discarding the lifetimes of all negative states and of all states that are followed by a negative state when calculating state lifetimes with mdtools.dtrj.lifetimes(). Must not be used together with discard_neg_start.

  • verbose (bool, optional) – If True print a progress bar.

Returns:

prob (numpy.ndarray) – Array of shape (f,) containing for all possible lag times the probability that a compound is in the same state as at time \(t_0\) after a lag time \(\Delta t\).

See also

mdtools.dtrj.remain_prob_discrete()

Calculate the probability that a compound is still (or again) in the same state as at time \(t_0\) after a lag time \(\Delta t\) resolved with respect to the states in second discrete trajectory

mdtools.dtrj.back_jump_prob()

Calculate the back-jump probability averaged over all states

mdtools.dtrj.leave_prob()

Calculate the probability that a compound leaves its state after a lag time \(\Delta t\) given that it has entered the state at time \(t_0\)

mdtools.dtrj.kaplan_meier()

Estimate the state survival function using the Kaplan-Meier estimator

mdtools.dtrj.trans_rate()

Calculate the transition rate for each compound averaged over all states

mdtools.dtrj.lifetimes()

Calculate the state lifetimes for each compound

Notes

If you want to compute the probability to stay in the same state for each individual state in a discrete trajectory, use mdtools.dtrj.remain_prob_discrete() and parse the same discrete trajectory to dtrj1 and dtrj2.

Continuous and Discontinuous Probability

If continuous is False, this function calculates the probability to be still or again in the same state as at time \(t_0\) after a lag \(\Delta t\):

\[p(\xi(t_0 + \Delta t) \in S | \xi(t_0) \in S) = \left\langle \frac{N(t_0, t_0 + \Delta t)}{N(t_0)} \right\rangle_t\]

Here, \(\xi(t)\) is the discretized coordinate and \(S\) is a given valid discrete state. \(N(t_0)\) is the number of compounds that are in a valid state at time \(t_0\) and \(N(t_0, t_0 + \Delta t)\) is the number of compounds that are at time \(t_0 + \Delta t\) still or again in the same valid state as at time \(t_0\). The brackets \(\langle ... \rangle_t\) denote averaging over all given restarting times \(t_0\).

If continuous is True, this function calculates the probability to be still in the same state as at time \(t_0\) after a lag time \(\Delta t\):

\[p(\xi(t) \in S ~\forall t \in [t_0, t_0 + \Delta t]) = \left\langle \frac{N(\{t \in [t_0, t_0 + \Delta t]\})}{N(t_0)} \right\rangle_t\]

\(N(\{t \in [t_0, t_0 + \Delta t]\})\) is the number of compounds that continuously stay in their initial valid state for all times from \(t_0\) to \(t_0 + \Delta t\).

Valid and Invalid States

By default, all states in the given discrete trajectory are valid. However, in some cases you might want to treat certain states as invalid, e.g. because you only want to consider specific states. This can be achieved with the arguments discard_neg_start or discard_all_neg which treat negative states (i.e. states with a negative state number/index) as invalid states.

Note

If you want to treat states with index zero as invalid, too, simply subtract one from dtrj. If you want to treat all states below a certain cutoff as invalid, subtract this cutoff from dtrj. If you want to treat certain states as invalid, make these states negative, e.g. by multiplying these states with minus one.

The arguments discard_neg_start and discard_all_neg affect the counting of compounds in a given state, i.e. they affect \(N(t_0)\), \(N(t_0, t_0 + \Delta t)\) and \(N(\{t \in [t_0, t_0 + \Delta t]\})\). In general, \(N\) is the sum of compounds that are in a valid state: \(N = \sum_i N_i\). \(N_i\) is \(1\) if compound \(i\) is in a valid state and \(0\) otherwise. discard_neg_start and discard_all_neg now affect when a state is considered valid.

If both, discard_neg_start and discard_all_neg, are False (the default), all states are valid.

\[N_i(t_0) = 1 ~\forall S_i(t_0)\]
\[\begin{split}N_i(t_0, t_0 + \Delta t) = \begin{cases} 1 & S_i(t_0 + \Delta t) = S_i(t_0) \\ 0 & \text{otherwise} \end{cases}\end{split}\]
\[\begin{split}N_i(\{t \in [t_0, t_0 + \Delta t]\}) = \begin{cases} 1 & S_i(t) = S_i(t_0) ~\forall t \in [t_0, t_0 + \Delta t]\\ 0 & \text{otherwise} \end{cases}\end{split}\]

Here, \(S_i(t)\) is the state of compound \(i\) at time \(t\).

If discard_neg_start is True, transitions starting from negative states are discarded:

\[\begin{split}N_i(t_0) = \begin{cases} 1 & S_i(t_0) \geq 0 \\ 0 & S_i(t_0) < 0 \end{cases}\end{split}\]
\[\begin{split}N_i(t_0, t_0 + \Delta t) = \begin{cases} 1 & S_i(t_0) \geq 0 \text{ and } \\ & S_i(t_0 + \Delta t) = S_i(t_0) \\ 0 & \text{otherwise} \end{cases}\end{split}\]
\[\begin{split}N_i(\{t \in [t_0, t_0 + \Delta t]\}) = \begin{cases} 1 & S_i(t_0) \geq 0 \text{ and } \\ & S_i(t) = S_i(t_0) ~\forall t \in [t_0, t_0 + \Delta t]\\ 0 & \text{otherwise} \end{cases}\end{split}\]

Thus, the resulting probability does not contain the probability to stay in a negative state. However, transitions from positive to negative states are respected. Hence, the probability to stay in a positive state is decreased if a transition to a negative states occurs (in addition to the decrease caused by transitions to other positive states).

If discard_all_neg is True, all negative states are discarded:

\[\begin{split}N_i(t_0) = \begin{cases} 1 & S_i(t) \geq 0 ~\forall t \in [t_0, t_0 + \Delta t] \\ 0 & \text{otherwise} \end{cases}\end{split}\]
\[\begin{split}N_i(t_0, t_0 + \Delta t) = \begin{cases} 1 & S_i(t) \geq 0 ~\forall t \in [t_0, t_0 + \Delta t] \text{ and } \\ & S_i(t_0 + \Delta t) = S_i(t_0) \\ 0 & \text{otherwise} \end{cases}\end{split}\]
\[\begin{split}N_i(\{t \in [t_0, t_0 + \Delta t]\}) = \begin{cases} 1 & S_i(t) \geq 0 ~\forall t \in [t_0, t_0 + \Delta t] \text{ and } \\ & S_i(t) = S_i(t_0) ~\forall t \in [t_0, t_0 + \Delta t]\\ 0 & \text{otherwise} \end{cases}\end{split}\]

This means transitions from or to negative states are completely discarded. Thus, the resulting probability does neither contain the probability to stay in a negative state nor is it affected by transitions from positive to negative states.

Lifetimes

Important

Because the compound might have already been in the same state before the start of the observation at time \(t_0\), the continuous probability is not the survival function of the underlying distribution of lifetimes, but rather the survival function of the underlying distribution of remaining/residual lifetimes. That is the time after which a compound that is randomly picked from the state will leave the state, regardless how long it has already been in this state. However, because long-lived states are sampled more often, the continuous (and discontinuous) probability to be still (or again) in the same state is biased to longer lifetimes. Also note that the discontinuous probability is just a measure for the state relaxation but is not directly related to the state lifetime.[6],[7] Thus, the following paragraphs have to be read with caution!

To calculate the survival function of the true lifetime distribution, use mdtools.dtrj.surv_func().

The calculated probabilities can be used to calculate the average lifetime of the valid states, i.e. how long a compounds resides on average in a valid state. A common approach is to fit the probability with a stretched exponential function \(f(t)\) (e.g. using mdtools.functions.fit_kww()) and afterwards calculating the area below the fitted curve:[1],[2],[3],[4]

\[f(t) = \exp{\left[ -\left(\frac{t}{\tau_0}\right)^\beta \right]}\]
\[\langle \tau \rangle = \int_0^\infty f(t) \text{ d}t = \frac{\tau_0}{\beta} \Gamma\left(\frac{1}{\beta}\right)\]

Here, \(\tau_0\) and \(\beta\) are the fit parameters, \(\langle \tau \rangle\) is the average lifetime and \(\Gamma(x)\) is the gamma function. For physically meaningful results, \(\beta\) should be confined to \(0 < \beta \leq 1\). For purely exponential decay (\(\beta = 1\)), \(\tau = \tau_0\) applies. If the calculated probability \(p(\Delta t)\) fully decays to zero within the accessible range of lag times, one can alternatively numerically integrate the calculated probability directly, instead of the fit \(f(t)\).

In general, the n-th moment of the underlying distribution of lifetimes is given by:[3],[4],[5]

\[\langle \tau^n \rangle = \frac{1}{(n-1)!} \int_0^\infty t^{n-1} f(t) \text{ d}t = \frac{\tau_0^n}{\beta} \frac{\Gamma\left(\frac{n}{\beta}\right)}{\Gamma(n)}\]

Moreover, the time-averaged lifetime is given by:[3]

\[\bar{\tau^n} = \frac{ \int_0^\infty t^n f(t) \text{ d}t }{ \int_0^\infty f(t) \text{ d}t } = \frac{\langle \tau^{n+1} \rangle}{\langle \tau \rangle} n! = \tau_0^n \frac{ \Gamma\left(\frac{n+1}{\beta}\right) }{ \Gamma\left(\frac{1}{\beta}\right) }\]

References

Examples

>>> dtrj = np.array([[2, 2, 3, 3, 3]])
>>> mdt.dtrj.remain_prob(dtrj)
array([1.        , 0.75      , 0.33333333, 0.        , 0.        ])
>>> mdt.dtrj.remain_prob(dtrj, continuous=True)
array([1.        , 0.75      , 0.33333333, 0.        , 0.        ])
>>> dtrj = np.array([[1, 3, 3, 3, 1]])
>>> mdt.dtrj.remain_prob(dtrj)
array([1.        , 0.5       , 0.33333333, 0.        , 1.        ])
>>> mdt.dtrj.remain_prob(dtrj, continuous=True)
array([1.        , 0.5       , 0.33333333, 0.        , 0.        ])

The following examples were not checked to be mathematically correct!

>>> dtrj = np.array([[ 1, -2, -2,  3,  3,  3],
...                  [-2, -2,  3,  3,  3,  1],
...                  [ 3,  3,  3,  1, -2, -2],
...                  [ 1,  3,  3,  3, -2, -2],
...                  [ 1,  4,  4,  4,  4, -1]])
>>> mdt.dtrj.remain_prob(dtrj)
array([1.        , 0.6       , 0.3       , 0.06666667, 0.        ,
       0.        ])
>>> mdt.dtrj.remain_prob(dtrj, restart=3)
array([1. , 0.5, 0.2, 0. , 0. , 0. ])
>>> mdt.dtrj.remain_prob(dtrj, continuous=True)
array([1.        , 0.6       , 0.3       , 0.06666667, 0.        ,
       0.        ])
>>> mdt.dtrj.remain_prob(dtrj, discard_neg_start=True)
array([1.        , 0.57894737, 0.375     , 0.09090909, 0.        ,
       0.        ])
>>> mdt.dtrj.remain_prob(
...     dtrj, discard_neg_start=True, continuous=True
... )
array([1.        , 0.57894737, 0.375     , 0.09090909, 0.        ,
       0.        ])
>>> mdt.dtrj.remain_prob(dtrj, discard_all_neg=True)
array([1.        , 0.73333333, 0.6       , 0.2       , 0.        ,
              nan])
>>> mdt.dtrj.remain_prob(
...     dtrj, discard_all_neg=True, continuous=True
... )
array([1.        , 0.73333333, 0.6       , 0.2       , 0.        ,
              nan])
>>> dtrj = dtrj.T
>>> mdt.dtrj.remain_prob(dtrj)
array([1.        , 0.375     , 0.11111111, 0.16666667, 0.16666667])
>>> mdt.dtrj.remain_prob(dtrj, restart=2)
array([1.        , 0.58333333, 0.        , 0.33333333, 0.16666667])
>>> mdt.dtrj.remain_prob(dtrj, continuous=True)
array([1.        , 0.375     , 0.05555556, 0.        , 0.        ])
>>> mdt.dtrj.remain_prob(dtrj, discard_neg_start=True)
array([1.        , 0.375     , 0.16666667, 0.25      , 0.25      ])
>>> mdt.dtrj.remain_prob(
...     dtrj, discard_neg_start=True, continuous=True
... )
array([1.        , 0.375     , 0.08333333, 0.        , 0.        ])
>>> mdt.dtrj.remain_prob(dtrj, discard_all_neg=True)
array([1.        , 0.46153846, 0.28571429, 0.33333333, 0.        ])
>>> mdt.dtrj.remain_prob(
...     dtrj, discard_all_neg=True, continuous=True
... )
array([1.        , 0.46153846, 0.14285714, 0.        , 0.        ])