kaplan_meier
- mdtools.dtrj.kaplan_meier(*args, **kwargs)[source]
Estimate the state survival function using the Kaplan-Meier estimator.
Take a discrete trajectory and calculate the probability that a compound is still in the new state at time \(t_0 + \Delta t\) given that a state transition has occurred at time \(t_0\).
- Parameters:
args, kwargs – This function takes the same parameters as
mdtools.dtrj.n_leaves_vs_time()
. See there for more information.- Returns:
sf (
numpy.ndarray
) – Array of shape(f,)
containing the values of the survival function for each possible lag time \(\Delta t\). The i-th element of the array is the probability that a compound is still in the new state i frames after it has entered the state.sf_var (
numpy.ndarray
) – Array of the same shape as sf containing the corresponding variance values of the survival function calculated using Greenwood’s formula.
See also
mdtools.dtrj.kaplan_meier_discrete()
Estimate the state survival function using the Kaplan-Meier estimator resolved with respect to the states in a second discrete trajectory
mdtools.dtrj.n_leaves_vs_time()
Calculate the number of compounds that leave their state after a lag time \(\Delta t\) given that they have entered the state at time \(t_0\)
mdtools.dtrj.remain_prob()
Calculate the probability that a compound is still (or again) in the same state as at time \(t_0\) after a lag time \(\Delta t\)
mdtools.dtrj.trans_rate()
Calculate the transition rate for each compound averaged over all states
mdtools.dtrj.lifetimes()
Calculate the state lifetimes for each compound
Notes
The survival function \(S(t)\) is the probability that a given compound has a state lifetime greater than or equal to \(t\). Alternatively, one might say \(S(t)\) percent of the compounds have a state lifetime greater than or equal to \(t\).
The survival function is related to the underlying distribution of state lifetimes \(f(t)\) by
\[S(t) = \int_t^\infty f(u) \text{ d}u = 1 - F(t)\]where \(F(t) = \int_0^t f(u) \text{ d}u\) is the cumulative distribution function.[1]
The survival function can be used to calculate the raw-moments of the distribution of state lifetimes via the alternative expectation formula:[2]
\[\langle t^n \rangle = \int_0^\infty t^n f(t) \text{ d}t = n \int_0^\infty t^{n-1} S(t) \text{ d}t\]The Kaplan-Meier estimator \(\hat{S}(t)\) of the survival function \(S(t)\) is given by[3]
\[\hat{S}(t) = \Pi_{i: t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right)\]where \(d_i\) is the number of compounds that leave their state at time \(t_i\) and \(n_i\) is the number of compounds that are at risk of leaving their state at time \(t_i\), i.e. the number of compounds that have not left their state and have not been censored up to time \(t_i\).
The estimated variance \(\hat{\text{Var}}(\hat{S}(t))\) of the Kaplan-Meier estimator \(\hat{S}(t)\) is calculated by Greenwood’s formula:[4]
\[\hat{\text{Var}}(\hat{S}(t)) = \hat{S}^2(t) \sum_{i: t_i \leq t} \frac{d_i}{n_i (n_i - d_i)}\]References
Examples
>>> # No detectable leave, two censored lifetimes. >>> dtrj = np.array([2, 2, 5, 5, 5, 5, 5]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([ 1., 1., 1., 1., 1., 1., nan]) >>> sf_var array([ 0., 0., 0., 0., 0., 0., nan]) >>> # One detectable leave, two censored lifetimes. >>> dtrj = np.array([2, 2, 3, 3, 3, 2, 2]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([ 1., 1., 1., 0., nan, nan, nan]) >>> sf_var array([ 0., 0., 0., 0., nan, nan, nan]) >>> # Two detectable leaves, two censored lifetimes. >>> dtrj = np.array([1, 3, 3, 3, 1, 2, 2]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 0.75, 0.75, 0. , nan, nan, nan]) >>> sf_var array([0. , 0.046875, 0.046875, 0. , nan, nan, nan]) >>> dtrj = np.array([1, 3, 3, 3, 2, 2, 1]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 1. , 0.5, 0. , nan, nan, nan]) >>> sf_var array([0. , 0. , 0.125, 0. , nan, nan, nan]) >>> dtrj = np.array([3, 3, 3, 1, 2, 2, 1]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 0.75 , 0.375, 0.375, nan, nan, nan]) >>> sf_var array([0. , 0.046875 , 0.08203125, 0.08203125, nan, nan, nan])
>>> dtrj = np.array([[2, 2, 5, 5, 5, 5, 5], ... [2, 2, 3, 3, 3, 2, 2], ... [1, 3, 3, 3, 1, 2, 2], ... [1, 3, 3, 3, 2, 2, 1], ... [3, 3, 3, 1, 2, 2, 1]]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 0.88235294, 0.72192513, 0.28877005, 0.28877005, 0.28877005, nan]) >>> sf_var array([0. , 0.00610625, 0.01461646, 0.02735508, 0.02735508, 0.02735508, nan]) >>> dtrj = np.array([[1, 2, 2, 3, 3, 3], ... [2, 2, 3, 3, 3, 1], ... [3, 3, 3, 1, 2, 2], ... [1, 3, 3, 3, 2, 2]]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 0.91666667, 0.80208333, 0.40104167, nan, nan]) >>> sf_var array([0. , 0.00636574, 0.01636194, 0.04429909, nan, nan])
Discarding negative states:
>>> dtrj = np.array([[1, 2, 2, 3, 3, 3], ... [2, 2, 3, 3, 3, 1], ... [3, 3, 3, 1, 2, 2], ... [1, 3, 3, 3, 2, 2], ... [1, 4, 4, 4, 4, 1]]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 0.93333333, 0.82962963, 0.49777778, 0. , nan]) >>> sf_var array([0. , 0.00414815, 0.01283707, 0.03765904, 0. , nan]) >>> sf_ns, sf_var_ns = mdt.dtrj.kaplan_meier( ... dtrj, discard_neg_start=True ... ) >>> sf_ns array([1. , 0.93333333, 0.82962963, 0.49777778, 0. , nan]) >>> sf_var_ns array([0. , 0.00414815, 0.01283707, 0.03765904, 0. , nan]) >>> sf_an, sf_var_an = mdt.dtrj.kaplan_meier( ... dtrj, discard_all_neg=True ... ) >>> sf_an array([1. , 0.93333333, 0.82962963, 0.49777778, 0. , nan]) >>> sf_var_an array([0. , 0.00414815, 0.01283707, 0.03765904, 0. , nan]) >>> dtrj = np.array([[ 1, -2, -2, 3, 3, 3], ... [-2, -2, 3, 3, 3, 1], ... [ 3, 3, 3, 1, -2, -2], ... [ 1, 3, 3, 3, -2, -2], ... [ 1, 4, 4, 4, 4, -1]]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 0.93333333, 0.82962963, 0.49777778, 0. , nan]) >>> sf_var array([0. , 0.00414815, 0.01283707, 0.03765904, 0. , nan]) >>> sf_ns, sf_var_ns = mdt.dtrj.kaplan_meier( ... dtrj, discard_neg_start=True ... ) >>> sf_ns array([1. , 0.9 , 0.9 , 0.54, 0. , nan]) >>> sf_var_ns array([0. , 0.009 , 0.009 , 0.04212, 0. , nan]) >>> sf_an, sf_var_an = mdt.dtrj.kaplan_meier( ... dtrj, discard_all_neg=True ... ) >>> sf_an array([1. , 1. , 1. , 0.8, 0.8, nan]) >>> sf_var_an array([0. , 0. , 0. , 0.032, 0.032, nan]) >>> dtrj = np.array([[ 1, -2, -2, 3, 3, 3], ... [-2, -2, 3, 3, 3, 1], ... [ 3, 3, 3, 1, -2, -2], ... [ 1, 3, 3, 3, -2, -2], ... [ 1, 4, 4, 4, 4, -1], ... [ 6, 6, 6, 6, 6, 6], ... [-6, -6, -6, -6, -6, -6]]) >>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj) >>> sf array([1. , 0.94117647, 0.85561497, 0.61115355, 0.4074357 , 0.4074357 ]) >>> sf_var array([0. , 0.00325667, 0.0093467 , 0.02611208, 0.03927268, 0.03927268]) >>> sf_ns, sf_var_ns = mdt.dtrj.kaplan_meier( ... dtrj, discard_neg_start=True ... ) >>> sf_ns array([1. , 0.90909091, 0.90909091, 0.60606061, 0.3030303 , 0.3030303 ]) >>> sf_var_ns array([0. , 0.00751315, 0.00751315, 0.0339483 , 0.05440076, 0.05440076]) >>> sf_an, sf_var_an = mdt.dtrj.kaplan_meier( ... dtrj, discard_all_neg=True ... ) >>> sf_an array([1. , 1. , 1. , 0.83333333, 0.83333333, 0.83333333]) >>> sf_var_an array([0. , 0. , 0. , 0.02314815, 0.02314815, 0.02314815])