correct_intermittency

mdtools.dynamics.correct_intermittency(list_of_arrays, intermittency, sparse2array=True, inplace=True, verbose=True, debug=False)[source]

Correct for intermittent behavior of discrete variables stored in a sequence of arrays.

Analogue of MDAnalysis.lib.correlations.correct_intermittency() for sequences of arbitrarily shaped arrays (e.g. a list of contact matrices as generated by mdtools.structure.contact_matrices()).

Fill gaps between same values with a gap size smaller or equal to a given maximum gap size with the enclosing values.

Parameters:
  • list_of_arrays (tuple or list or numpy.ndarray) – Sequence of arrays (usually one array per trajectory frame). All arrays in the sequence must have the same shape and must be either NumPy arrays or SciPy sparse matrices. A mix of NumPy arrays and SciPy sparse matrices is not possible.

  • intermittency (int) – The maximum allowed gap size. All gaps with a size smaller or equal to the maximum gap size are considered to be negligible and are filled with their enclosing values. If intermittency is less than one, list_of_arrays will be returned without any changes.

  • sparse2array (bool, optional) – If True, SciPy sparse matrices are internally converted on the fly to NumPy arrays for the computations and back again to SciPy sparse matrices using the sliding window method. The window size is automatically determined by the intermittency value. This conversion can increase the performance considerably, but may also lead to a higher memory consumption. sparse2array is ignored, if list_of_arrays is already a sequence of NumPy arrays. The back conversion to SciPy sparse matrices will always convert back to the format of the first sparse matrix in list_of_arrays.

  • inplace (bool, optional) – If True, modify list_of_arrays inplace instead of creating a copy of list_of_arrays.

  • verbose (bool, optional) – If True, print progress information to standard output.

  • debug (bool, optional) – If True, run in debug mode.

    Deprecated since version 0.0.0.dev0: This argument is without use and will be removed in a future release.

Returns:

list_of_arrays (tuple or list or numpy.ndarray) – The input sequence of arrays with gaps of size <=intermittency between same values filled with the enclosing values.

See also

mdtools.dynamics.correct_intermittency_1d()

Correct for intermittent behavior of discrete variables stored in a 1-dimensional array

mdtools.dynamics.replace_short_sequences()

Replace consecutive occurrences of the same value that are shorter than a given minimum length by a given value

mdtools.dynamics.replace_short_sequences_global()

TODO

Notes

Possible use case of this function: Calculating lifetimes from the autocorrelation of contact matrices (autocorrelation of the existence function). Lifetimes may be calculated either with a strict continuous requirement or a less strict intermittency. If you are for instance calculating the lifetime of a ligand around a metal ion, in the former case the ligand must be within a cutoff distance of the metal ion at every frame from \(t_0\) to \(t_0 + \tau\) in order to be considered present at \(t_0 + \tau\). In the intermittent case, the ligand is allowed to leave the region of interest for up to a specified consecutive number of frames whilst still being considered present at \(t_0 + \tau\).

For example, if a ligand is absent for a number of frames equal to or smaller than the parameter intermittency, then this absence will be removed and thus the ligand is considered to have not left. For instance, T,F,F,T with intermittency set to 2 will be replaced by T,T,T,T, where T=True=present, F=False=absent. On the other hand, F,T,T,F will be replaced by F,F,F,F. Note the following behaviour when the elements of the arrays in list_of_arrays can take more than two different values: 0,1,1,0,1,1,2 will be replaced by 0,0,0,0,1,1,2 with intermittency=2, because the last 1,1 block is not enclosed by the same value. If you want 0,1,1,0,1,1,2 to be replaced by 0,0,0,0,0,0,0, take a look at mdtools.dynamics.replace_short_sequences().

If you want to apply this function on a 1-dimensional array, parse the array as numpy.expand_dims(array, axis=0).T and convert the output back to a 1-dimensional array via np.squeeze(result.T). However, we recommend to use mdtools.dynamics.correct_intermittency_1d() instead, which is a fast version of mdtools.dynamics.correct_intermittency() for 1-dimensional arrays.

The intermittency value must not exceed 65534 (np.uint16(-1) - 1). If you really need a higher intermittency value, change the dtype of seen_nframes_ago in the source code of this function. Be aware that high intermittency values increase the computational cost. Ask yourself if you could not better read-in only every intermittency/2-th frame and set intermittency to one or even read-in only every intermittency-th frame and set intermittency to zero.

Examples

>>> a = np.array([[0, 0, 0],
...               [1, 1, 1]])
>>> b = np.array([[1, 1, 1],
...               [0, 0, 0]])
>>> c = np.array([[1, 0, 1],
...               [0, 1, 0]])
>>> d = np.array([[0, 1, 0],
...               [1, 0, 1]])
>>> x = (a, b, c, d)
>>> y = mdt.dyn.correct_intermittency(x,
...                                   intermittency=1,
...                                   inplace=False,
...                                   verbose=False)
>>> for arr in y:
...     print(arr)
[[0 0 0]
 [1 1 1]]
[[1 0 1]
 [0 1 0]]
[[1 0 1]
 [0 1 0]]
[[0 1 0]
 [1 0 1]]
>>> y = mdt.dyn.correct_intermittency(x,
...                                   intermittency=2,
...                                   inplace=False,
...                                   verbose=False)
>>> for arr in y:
...     print(arr)
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 1 0]
 [1 0 1]]

An intermittency value larger than len(list_of_arrays)-2 leads to the same result as an intermittency value of len(list_of_arrays)-2. The first and the last arrays are never changed:

>>> y = mdt.dyn.correct_intermittency(x,
...                                   intermittency=len(x)-1,
...                                   inplace=False,
...                                   verbose=False)
>>> for arr in y:
...     print(arr)
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 1 0]
 [1 0 1]]
>>> np.array_equal(y[0], a)
True
>>> np.array_equal(y[3], d)
True

If intermittency is less than one, the input list_of_arrays is returned without any changes:

>>> y = mdt.dyn.correct_intermittency(x,
...                                   intermittency=0,
...                                   inplace=False,
...                                   verbose=False)
>>> all(np.array_equal(x[i], arr) for i, arr in enumerate(y))
True

Using SciPy sparse matrices:

>>> from scipy.sparse import csr_matrix
>>> x_sparse = [csr_matrix(arr) for arr in x]
>>> y = mdt.dyn.correct_intermittency(x_sparse,
...                                   intermittency=2,
...                                   sparse2array=False,
...                                   inplace=False,
...                                   verbose=False)
>>> for csr_mat in y:
...     print(csr_mat.toarray())
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 1 0]
 [1 0 1]]
>>> y = mdt.dyn.correct_intermittency(x_sparse,
...                                   intermittency=2,
...                                   sparse2array=True,
...                                   inplace=False,
...                                   verbose=False)
>>> for csr_mat in y:
...     print(csr_mat.toarray())
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 0 0]
 [1 1 1]]
[[0 1 0]
 [1 0 1]]

Inplace modification:

>>> y = mdt.dyn.correct_intermittency(x,
...                                   intermittency=2,
...                                   inplace=True,
...                                   verbose=False)
>>> x is y
True