lig_change_at_pos_change_blocks

Compare the coordination environment of reference compounds before and after they have changed their position.

Todo

Properly account for periodic boundary conditions in the direction given with --d. At the moment the assignment ‘left to right’ or ‘right to left’ is wrong for jumps accross periodic boundaries.
Allow to choose between center of mass and center of geometry (this feature has to be implemented in mdtools.structure.discrete_pos_trj()).
Finish docstring.

Thereby distinct between four different types of postion changes: ‘left to right’, ‘right to left’, ‘left to right unsuccessful’ and ‘right to left unsuccessful’. Position changes are identified as transitions between blocks of consecutive frames in which the reference compounds stay at the same position.

Options

-f

Trajectory file. See supported coordinate formats of MDAnalysis.

-s

Topology file. See supported topology formats of MDAnalysis.

-o

Output filename. Name of the text file to which to write the statistics about the coordination environment of the reference compounds before and after the position change.

--dtrj-out

Output filename for the discrete trajectory (optional). If provided, the discrete trajectory is written as binary dtrj.npy file in a compressed .npz archive of the given filename. The discrete trajectory is stored as numpy.ndarray of dtype numpy.uint32 and shape (n, f), where n is the number of reference compounds and f is the number of frames. The elements of the discrete trajectory are the states in which a given compound resides at a given frame.

--bins-out

Output filename for the bin edges (optional). If provided, the (average) bin edges used for creating the discrete trajectory are written to a text file of the given filename.

-b

First frame to read from the trajectory. Frame numbering starts at zero. Default: 0.

-e

Last frame to read from the trajectory. This is exclusive, i.e. the last frame read is actually END - 1. A value of -1 means to read the very last frame. Default: -1.

--every

Read every n-th frame from the trajectory. Default: 1.

--ref

Selection string to select the reference group. See MDAnalysis’ selection syntax for possible choices.

--sel

Selection string to select the selection group. See MDAnalysis’ selection syntax for possible choices.

-c

Cutoff distance in Angstrom. A reference and selection atom are considered to be in contact, if their distance is less than or equal to this cutoff.

--refcmp

{‘group’, ‘segments’, ‘residues’, ‘fragments’, ‘atoms’}

The compounds of the reference group whose center of mass positions should be discretized and whose contacts to selection atoms and compounds should be compared before and after a position change. Reference compounds can be ‘group’ (the entire reference group), ‘segments’, ‘residues’, ‘fragments’, or ‘atoms’. Refer to the MDAnalysis’ user guide for an explanation of these terms. Compounds are made whole before calculating their centers of mass. The centers of mass are wrapped back into the primary unit cell before discretizing their positions. Note that in any case, even if REFCMP is e.g. ‘residues’, only the atoms belonging to the reference group are taken into account for contact calculation, even if the compound might comprise additional atoms that are not contained in the reference group. However, the center of mass calculation is done considering all atoms of a compound, including those that are not part of the reference group. Default: 'atoms'

--selcmp

{‘group’, ‘segments’, ‘residues’, ‘fragments’}

The compounds of the selection group to use for calculating the contact histograms. Contacts between reference compounds and selection atoms are always counted. Additionally, contacts between reference and selection compounds are counted, too. Specify here, which compounds to use for the selection group. Note that in any case, even if SELCMP is e.g. ‘residues’, only the atoms belonging to the selection group are taken into account, even if the compound might comprise additional atoms that are not contained in the selection group. Default: 'residues'

--lag

The lag time \(\tau\) (in trajectory frames). The coordination environment of the reference compounds is compared LAG frames before they leave their block and LAG frames after they have entered the new block. LAG must be equal to or greater than zero, but not greater than half of the total number of frames in the trajectory. LAG must be an integer multiple of EVERY. Default: 0

--min-block-size

Minimum block size (in trajectory frames). Blocks of consecutive frames in which a given reference compound stays in the same position bin must comprise at least this many frames in order to be counted as valid block. MIN_BLOCK_SIZE must be greater than LAG and an integer multiple of EVERY. Default: 2*LAG if LAG is not zero, otherwise 2*EVERY.

--max-gap-size

Maximum gap size (in trajectory frames). The gap between two following valid blocks must not be greater than this many frames in order to count the transition between the two valid blocks as valid transition. MAX_GAP_SIZE must be equal to or greater than zero, but should be less than MIN_BLOCK_SIZE. It must be an integer multiple of EVERY. Default: LAG//(2*EVERY)

-d

{‘x’, ‘y’, ‘z’}

Direction. The spatial direction in which to bin the positions of the reference compounds. Default: 'z'

--bin-start

Point (in Angstrom) on the chosen spatial direction to start binning. Note that binning naturally starts at zero (origin of the simulation box). If parsing a start value greater than zero, the first bin interval will be [0, START). In this way you can determine the width of the first bin independently from the other bins. Note that START must lie within the simulation box obtained from the first frame read and it must be smaller than STOP. Default: 0

--bin-end

Point (in Angstrom) on the chosen spatial direction to stop binning. Note that binning naturally ends at lbox + tol (length of the simulation box in the given spatial direction plus a small tolerance to account for the right-open bin interval). If parsing a value less than lbox, the last bin interval will be [STOP, lbox+tol). In this way you can determine the width of the last bin independently from the other bins. Note that STOP must lie within the simulation box obtained from the first frame read and it must be greater than START. Default: lbox + tol

--bin-num

Number of equidistant bins (not bin edges!) to use for discretizing the given spatial direction between START and STOP. Note that two additional bins, [0, START) and [STOP, lbox+tol), are created if START is not zero and STOP is not lbox. Default: 10

--bins

Text file containing custom bin edges (in Angstrom). Bin edges are read from the first column, characters following a ‘#’ are ignored. Bins do not need to be equidistant. All bin edges must lie within the simulation box as obtained from the first frame read. If --bins is given, it takes precedence over all other --bin* flags.

--debug

Run in debug mode.

See also

lig_change_at_pos_change_blocks_hist: Similar to lig_change_at_pos_change_blocks, but additionally keeps track of the position history of the reference compounds but therefore does not resolve the direction of unsuccessful position changes
contact_hist: Calculate the number of contacts between two MDAnalysis AtomGroups
mdtools.structure.discrete_pos_trj(): Function that creates a discrete posotion trajectory

Notes

This scripts works as follows:

In a first step, the center of mass positions of the reference compounds are discretized along the given spatial direction using mdtools.structure.discrete_pos_trj(). The result is a discrete position trajectory for every single reference compound.

In a second step, the discrete position trajectories are searched for blocks of at least MIN_BLOCK_SIZE consecutive frames in which a given reference compound stays in the same position bin. A position change of the reference compound is defined as a transition between two such blocks that are not more than MAX_GAP_SIZE frames appart from each other (therefore a position change is also called ‘block transition’ within this script).

There are four different types of position changes:

The index of the position bin in which the reference compound resides in the first block is less than in the second block. These position changes are classified as ‘left to right’ (or ‘in positive direction’).

The index of the position bin in which the reference compound resides in the first block is greater than in the second block. These position changes are classified as ‘right to left’ (or ‘in negative direction’).

The bin index is the same in both blocks and is less than the bin index in the first frame after the first block ends. These position changes are classified as ‘left to right unsuccessful’. This means the compound leaves the position bin, where it has been for at least MIN_BLOCK_SIZE frames, in positive direction, but returns within MAX_GAP_SIZE frames to its initial position bin and resides there for at least another MIN_BLOCK_SIZE frames.

The bin index is the same in both blocks and is greater than the bin index in the first frame after the first block ends. These position changes are classified as ‘right to left unsuccessful’.

Note that position changes cannot be ‘unsuccessful’ if MAX_GAP_SIZE is zero.

After identification and classification of the position changes, the coordination environment of the reference compounds are compared a given lag time \(\tau\) before and after the position change. Precisely, if the first block ends at time (frame) \(t_0^{(1)}\) and the second block starts at \(t_0^{(2)}\), the coordination environment is compared at times \(t_0^{(1)} - \tau\) and \(t_0^{(2)} + \tau\). To chose an appropriate lag time \(\tau\), you can for instance first apply scripts.discretization.back_jump_prob and scripts.discretization.kaplan_meier on the output of discrete_pos.

Further notes:

The simulation box must be orthogonal, otherwise the discretization of the center of mass positions of the reference compounds does not work. For more details about the discretization see the Notes section of mdtools.structure.discrete_pos_trj().

Examples

TODO