lig_change_at_pos_change_blocks
Compare the coordination environment of reference compounds before and after they have changed their position.
Todo
Properly account for periodic boundary conditions in the direction given with --d. At the moment the assignment ‘left to right’ or ‘right to left’ is wrong for jumps accross periodic boundaries.
Allow to choose between center of mass and center of geometry (this feature has to be implemented in
mdtools.structure.discrete_pos_trj()
).Finish docstring.
Thereby distinct between four different types of postion changes: ‘left to right’, ‘right to left’, ‘left to right unsuccessful’ and ‘right to left unsuccessful’. Position changes are identified as transitions between blocks of consecutive frames in which the reference compounds stay at the same position.
Options
- -f
Trajectory file. See supported coordinate formats of MDAnalysis.
- -s
Topology file. See supported topology formats of MDAnalysis.
- -o
Output filename. Name of the text file to which to write the statistics about the coordination environment of the reference compounds before and after the position change.
- --dtrj-out
Output filename for the discrete trajectory (optional). If provided, the discrete trajectory is written as binary
dtrj.npy
file in a compressed .npz archive of the given filename. The discrete trajectory is stored asnumpy.ndarray
of dtypenumpy.uint32
and shape(n, f)
, wheren
is the number of reference compounds andf
is the number of frames. The elements of the discrete trajectory are the states in which a given compound resides at a given frame.- --bins-out
Output filename for the bin edges (optional). If provided, the (average) bin edges used for creating the discrete trajectory are written to a text file of the given filename.
- -b
First frame to read from the trajectory. Frame numbering starts at zero. Default:
0
.- -e
Last frame to read from the trajectory. This is exclusive, i.e. the last frame read is actually
END - 1
. A value of-1
means to read the very last frame. Default:-1
.- --every
Read every n-th frame from the trajectory. Default:
1
.- --ref
Selection string to select the reference group. See MDAnalysis’ selection syntax for possible choices.
- --sel
Selection string to select the selection group. See MDAnalysis’ selection syntax for possible choices.
- -c
Cutoff distance in Angstrom. A reference and selection atom are considered to be in contact, if their distance is less than or equal to this cutoff.
- --refcmp
{‘group’, ‘segments’, ‘residues’, ‘fragments’, ‘atoms’}
The compounds of the reference group whose center of mass positions should be discretized and whose contacts to selection atoms and compounds should be compared before and after a position change. Reference compounds can be ‘group’ (the entire reference group), ‘segments’, ‘residues’, ‘fragments’, or ‘atoms’. Refer to the MDAnalysis’ user guide for an explanation of these terms. Compounds are made whole before calculating their centers of mass. The centers of mass are wrapped back into the primary unit cell before discretizing their positions. Note that in any case, even if
REFCMP
is e.g. ‘residues’, only the atoms belonging to the reference group are taken into account for contact calculation, even if the compound might comprise additional atoms that are not contained in the reference group. However, the center of mass calculation is done considering all atoms of a compound, including those that are not part of the reference group. Default:'atoms'
- --selcmp
{‘group’, ‘segments’, ‘residues’, ‘fragments’}
The compounds of the selection group to use for calculating the contact histograms. Contacts between reference compounds and selection atoms are always counted. Additionally, contacts between reference and selection compounds are counted, too. Specify here, which compounds to use for the selection group. Note that in any case, even if
SELCMP
is e.g. ‘residues’, only the atoms belonging to the selection group are taken into account, even if the compound might comprise additional atoms that are not contained in the selection group. Default:'residues'
- --lag
The lag time \(\tau\) (in trajectory frames). The coordination environment of the reference compounds is compared
LAG
frames before they leave their block andLAG
frames after they have entered the new block.LAG
must be equal to or greater than zero, but not greater than half of the total number of frames in the trajectory.LAG
must be an integer multiple ofEVERY
. Default:0
- --min-block-size
Minimum block size (in trajectory frames). Blocks of consecutive frames in which a given reference compound stays in the same position bin must comprise at least this many frames in order to be counted as valid block.
MIN_BLOCK_SIZE
must be greater thanLAG
and an integer multiple ofEVERY
. Default:2*LAG
ifLAG
is not zero, otherwise2*EVERY
.- --max-gap-size
Maximum gap size (in trajectory frames). The gap between two following valid blocks must not be greater than this many frames in order to count the transition between the two valid blocks as valid transition.
MAX_GAP_SIZE
must be equal to or greater than zero, but should be less thanMIN_BLOCK_SIZE
. It must be an integer multiple ofEVERY
. Default:LAG//(2*EVERY)
- -d
{‘x’, ‘y’, ‘z’}
Direction. The spatial direction in which to bin the positions of the reference compounds. Default:
'z'
- --bin-start
Point (in Angstrom) on the chosen spatial direction to start binning. Note that binning naturally starts at zero (origin of the simulation box). If parsing a start value greater than zero, the first bin interval will be
[0, START)
. In this way you can determine the width of the first bin independently from the other bins. Note thatSTART
must lie within the simulation box obtained from the first frame read and it must be smaller thanSTOP
. Default:0
- --bin-end
Point (in Angstrom) on the chosen spatial direction to stop binning. Note that binning naturally ends at
lbox + tol
(length of the simulation box in the given spatial direction plus a small tolerance to account for the right-open bin interval). If parsing a value less thanlbox
, the last bin interval will be[STOP, lbox+tol)
. In this way you can determine the width of the last bin independently from the other bins. Note thatSTOP
must lie within the simulation box obtained from the first frame read and it must be greater thanSTART
. Default:lbox + tol
- --bin-num
Number of equidistant bins (not bin edges!) to use for discretizing the given spatial direction between
START
andSTOP
. Note that two additional bins,[0, START)
and[STOP, lbox+tol)
, are created ifSTART
is not zero andSTOP
is notlbox
. Default:10
- --bins
Text file containing custom bin edges (in Angstrom). Bin edges are read from the first column, characters following a ‘#’ are ignored. Bins do not need to be equidistant. All bin edges must lie within the simulation box as obtained from the first frame read. If --bins is given, it takes precedence over all other --bin* flags.
- --debug
Run in debug mode.
See also
lig_change_at_pos_change_blocks_hist
Similar to
lig_change_at_pos_change_blocks
, but additionally keeps track of the position history of the reference compounds but therefore does not resolve the direction of unsuccessful position changescontact_hist
Calculate the number of contacts between two MDAnalysis
AtomGroups
mdtools.structure.discrete_pos_trj()
Function that creates a discrete posotion trajectory
Notes
This scripts works as follows:
In a first step, the center of mass positions of the reference compounds
are discretized along the given spatial direction using
mdtools.structure.discrete_pos_trj()
. The result is a discrete
position trajectory for every single reference compound.
In a second step, the discrete position trajectories are searched for
blocks of at least MIN_BLOCK_SIZE
consecutive frames in which a
given reference compound stays in the same position bin. A position
change of the reference compound is defined as a transition between two
such blocks that are not more than MAX_GAP_SIZE
frames appart from
each other (therefore a position change is also called ‘block
transition’ within this script).
There are four different types of position changes:
The index of the position bin in which the reference compound resides in the first block is less than in the second block. These position changes are classified as ‘left to right’ (or ‘in positive direction’).
The index of the position bin in which the reference compound resides in the first block is greater than in the second block. These position changes are classified as ‘right to left’ (or ‘in negative direction’).
The bin index is the same in both blocks and is less than the bin index in the first frame after the first block ends. These position changes are classified as ‘left to right unsuccessful’. This means the compound leaves the position bin, where it has been for at least
MIN_BLOCK_SIZE
frames, in positive direction, but returns withinMAX_GAP_SIZE
frames to its initial position bin and resides there for at least anotherMIN_BLOCK_SIZE
frames.The bin index is the same in both blocks and is greater than the bin index in the first frame after the first block ends. These position changes are classified as ‘right to left unsuccessful’.
Note that position changes cannot be ‘unsuccessful’ if MAX_GAP_SIZE
is zero.
After identification and classification of the position changes, the
coordination environment of the reference compounds are compared a given
lag time \(\tau\) before and after the position change. Precisely,
if the first block ends at time (frame) \(t_0^{(1)}\) and the second
block starts at \(t_0^{(2)}\), the coordination environment is
compared at times \(t_0^{(1)} - \tau\) and \(t_0^{(2)} + \tau\).
To chose an appropriate lag time \(\tau\), you can for instance
first apply scripts.discretization.back_jump_prob
and
scripts.discretization.kaplan_meier
on the output of
discrete_pos
.
Further notes:
The simulation box must be orthogonal, otherwise the discretization
of the center of mass positions of the reference compounds does not
work. For more details about the discretization see the Notes section
of mdtools.structure.discrete_pos_trj()
.
Examples
TODO