kaplan_meier

mdtools.dtrj.kaplan_meier(*args, **kwargs)[source]

Estimate the state survival function using the Kaplan-Meier estimator.

Take a discrete trajectory and calculate the probability that a compound is still in the new state at time \(t_0 + \Delta t\) given that a state transition has occurred at time \(t_0\).

Parameters:

args, kwargs – This function takes the same parameters as mdtools.dtrj.n_leaves_vs_time(). See there for more information.

Returns:

sf (numpy.ndarray) – Array of shape (f,) containing the values of the survival function for each possible lag time \(\Delta t\). The i-th element of the array is the probability that a compound is still in the new state i frames after it has entered the state.
sf_var (numpy.ndarray) – Array of the same shape as sf containing the corresponding variance values of the survival function calculated using Greenwood’s formula.

See also

mdtools.dtrj.kaplan_meier_discrete(): Estimate the state survival function using the Kaplan-Meier estimator resolved with respect to the states in a second discrete trajectory
mdtools.dtrj.n_leaves_vs_time(): Calculate the number of compounds that leave their state after a lag time \(\Delta t\) given that they have entered the state at time \(t_0\)
mdtools.dtrj.remain_prob(): Calculate the probability that a compound is still (or again) in the same state as at time \(t_0\) after a lag time \(\Delta t\)
mdtools.dtrj.trans_rate(): Calculate the transition rate for each compound averaged over all states
mdtools.dtrj.lifetimes(): Calculate the state lifetimes for each compound

Notes

The survival function \(S(t)\) is the probability that a given compound has a state lifetime greater than or equal to \(t\). Alternatively, one might say \(S(t)\) percent of the compounds have a state lifetime greater than or equal to \(t\).

The survival function is related to the underlying distribution of state lifetimes \(f(t)\) by

\[S(t) = \int_t^\infty f(u) \text{ d}u = 1 - F(t)\]

where \(F(t) = \int_0^t f(u) \text{ d}u\) is the cumulative distribution function.[1]

The survival function can be used to calculate the raw-moments of the distribution of state lifetimes via the alternative expectation formula:[2]

\[\langle t^n \rangle = \int_0^\infty t^n f(t) \text{ d}t = n \int_0^\infty t^{n-1} S(t) \text{ d}t\]

The Kaplan-Meier estimator \(\hat{S}(t)\) of the survival function \(S(t)\) is given by[3]

\[\hat{S}(t) = \Pi_{i: t_i \leq t} \left( 1 - \frac{d_i}{n_i} \right)\]

where \(d_i\) is the number of compounds that leave their state at time \(t_i\) and \(n_i\) is the number of compounds that are at risk of leaving their state at time \(t_i\), i.e. the number of compounds that have not left their state and have not been censored up to time \(t_i\).

The estimated variance \(\hat{\text{Var}}(\hat{S}(t))\) of the Kaplan-Meier estimator \(\hat{S}(t)\) is calculated by Greenwood’s formula:[4]

\[\hat{\text{Var}}(\hat{S}(t)) = \hat{S}^2(t) \sum_{i: t_i \leq t} \frac{d_i}{n_i (n_i - d_i)}\]

References

Examples

>>> # No detectable leave, two censored lifetimes.
>>> dtrj = np.array([2, 2, 5, 5, 5, 5, 5])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([ 1.,  1.,  1.,  1.,  1.,  1., nan])
>>> sf_var
array([ 0.,  0.,  0.,  0.,  0.,  0., nan])
>>> # One detectable leave, two censored lifetimes.
>>> dtrj = np.array([2, 2, 3, 3, 3, 2, 2])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([ 1.,  1.,  1.,  0., nan, nan, nan])
>>> sf_var
array([ 0.,  0.,  0.,  0., nan, nan, nan])
>>> # Two detectable leaves, two censored lifetimes.
>>> dtrj = np.array([1, 3, 3, 3, 1, 2, 2])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1.  , 0.75, 0.75, 0.  ,  nan,  nan,  nan])
>>> sf_var
array([0.      , 0.046875, 0.046875, 0.      ,      nan,      nan,
            nan])
>>> dtrj = np.array([1, 3, 3, 3, 2, 2, 1])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1. , 1. , 0.5, 0. , nan, nan, nan])
>>> sf_var
array([0.   , 0.   , 0.125, 0.   ,   nan,   nan,   nan])
>>> dtrj = np.array([3, 3, 3, 1, 2, 2, 1])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1.   , 0.75 , 0.375, 0.375,   nan,   nan,   nan])
>>> sf_var
array([0.        , 0.046875  , 0.08203125, 0.08203125,        nan,
              nan,        nan])

>>> dtrj = np.array([[2, 2, 5, 5, 5, 5, 5],
...                  [2, 2, 3, 3, 3, 2, 2],
...                  [1, 3, 3, 3, 1, 2, 2],
...                  [1, 3, 3, 3, 2, 2, 1],
...                  [3, 3, 3, 1, 2, 2, 1]])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1.        , 0.88235294, 0.72192513, 0.28877005, 0.28877005,
       0.28877005,        nan])
>>> sf_var
array([0.        , 0.00610625, 0.01461646, 0.02735508, 0.02735508,
       0.02735508,        nan])
>>> dtrj = np.array([[1, 2, 2, 3, 3, 3],
...                  [2, 2, 3, 3, 3, 1],
...                  [3, 3, 3, 1, 2, 2],
...                  [1, 3, 3, 3, 2, 2]])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1.        , 0.91666667, 0.80208333, 0.40104167,        nan,
              nan])
>>> sf_var
array([0.        , 0.00636574, 0.01636194, 0.04429909,        nan,
              nan])

Discarding negative states:

>>> dtrj = np.array([[1, 2, 2, 3, 3, 3],
...                  [2, 2, 3, 3, 3, 1],
...                  [3, 3, 3, 1, 2, 2],
...                  [1, 3, 3, 3, 2, 2],
...                  [1, 4, 4, 4, 4, 1]])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1.        , 0.93333333, 0.82962963, 0.49777778, 0.        ,
              nan])
>>> sf_var
array([0.        , 0.00414815, 0.01283707, 0.03765904, 0.        ,
              nan])
>>> sf_ns, sf_var_ns = mdt.dtrj.kaplan_meier(
...     dtrj, discard_neg_start=True
... )
>>> sf_ns
array([1.        , 0.93333333, 0.82962963, 0.49777778, 0.        ,
              nan])
>>> sf_var_ns
array([0.        , 0.00414815, 0.01283707, 0.03765904, 0.        ,
              nan])
>>> sf_an, sf_var_an = mdt.dtrj.kaplan_meier(
...     dtrj, discard_all_neg=True
... )
>>> sf_an
array([1.        , 0.93333333, 0.82962963, 0.49777778, 0.        ,
              nan])
>>> sf_var_an
array([0.        , 0.00414815, 0.01283707, 0.03765904, 0.        ,
              nan])
>>> dtrj = np.array([[ 1, -2, -2,  3,  3,  3],
...                  [-2, -2,  3,  3,  3,  1],
...                  [ 3,  3,  3,  1, -2, -2],
...                  [ 1,  3,  3,  3, -2, -2],
...                  [ 1,  4,  4,  4,  4, -1]])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1.        , 0.93333333, 0.82962963, 0.49777778, 0.        ,
              nan])
>>> sf_var
array([0.        , 0.00414815, 0.01283707, 0.03765904, 0.        ,
              nan])
>>> sf_ns, sf_var_ns = mdt.dtrj.kaplan_meier(
...     dtrj, discard_neg_start=True
... )
>>> sf_ns
array([1.  , 0.9 , 0.9 , 0.54, 0.  ,  nan])
>>> sf_var_ns
array([0.     , 0.009  , 0.009  , 0.04212, 0.     ,     nan])
>>> sf_an, sf_var_an = mdt.dtrj.kaplan_meier(
...     dtrj, discard_all_neg=True
... )
>>> sf_an
array([1. , 1. , 1. , 0.8, 0.8, nan])
>>> sf_var_an
array([0.   , 0.   , 0.   , 0.032, 0.032,   nan])
>>> dtrj = np.array([[ 1, -2, -2,  3,  3,  3],
...                  [-2, -2,  3,  3,  3,  1],
...                  [ 3,  3,  3,  1, -2, -2],
...                  [ 1,  3,  3,  3, -2, -2],
...                  [ 1,  4,  4,  4,  4, -1],
...                  [ 6,  6,  6,  6,  6,  6],
...                  [-6, -6, -6, -6, -6, -6]])
>>> sf, sf_var = mdt.dtrj.kaplan_meier(dtrj)
>>> sf
array([1.        , 0.94117647, 0.85561497, 0.61115355, 0.4074357 ,
       0.4074357 ])
>>> sf_var
array([0.        , 0.00325667, 0.0093467 , 0.02611208, 0.03927268,
       0.03927268])
>>> sf_ns, sf_var_ns = mdt.dtrj.kaplan_meier(
...     dtrj, discard_neg_start=True
... )
>>> sf_ns
array([1.        , 0.90909091, 0.90909091, 0.60606061, 0.3030303 ,
       0.3030303 ])
>>> sf_var_ns
array([0.        , 0.00751315, 0.00751315, 0.0339483 , 0.05440076,
       0.05440076])
>>> sf_an, sf_var_an = mdt.dtrj.kaplan_meier(
...     dtrj, discard_all_neg=True
... )
>>> sf_an
array([1.        , 1.        , 1.        , 0.83333333, 0.83333333,
       0.83333333])
>>> sf_var_an
array([0.        , 0.        , 0.        , 0.02314815, 0.02314815,
       0.02314815])