contact_hist_refcmp_same_selcmp

mdtools.structure.contact_hist_refcmp_same_selcmp(cm, natms_per_refcmp=1, natms_per_selcmp=1, minlength=0, dtype=int)[source]

Bin the number of contacts that reference compounds establish to the same selection compound into a histogram.

A compound is usually a chemically meaningful subgroup of an AtomGroup. This can e.g. be a Segment, Residue, fragment or a single Atom. Refer to the MDAnalysis’ user guide for an explanation of these terms. Note that in any case, only Atoms belonging to the original AtomGroup are taken into account, even if the compound might comprise additional Atoms that are not contained in the original AtomGroup.

Parameters:
  • cm (array_like) – (Boolean) contact matrix of shape (m, n) as e.g. generated by mdtools.structure.contact_matrix(), where m is the number of reference Atoms and n is the number of selection Atoms. Alternatively, cm can already be a compound contact count matrix as e.g. generated by mdtools.structure.cmp_contact_count_matrix(). In this case, you probably want to set natms_per_refcmp and natms_per_selcmp to 1, to keep cm unaltered.

  • natms_per_refcmp (int or array_like, optional) – Number of Atoms per reference compound. Can be a single integer or an array of integers. If natms_per_refcmp is a single integer, all reference compounds are assumed to contain the same number of Atoms. In this case, natms_per_refcmp must be an integer divisor of cm.shape[0]. If natms_per_refcmp is an array of integers, it must contain the number of reference Atoms for each single reference compound. In this case, sum(natms_per_refcmp) must be equal to cm.shape[0].

  • natms_per_selcmp (int or array_like, optional) – Same for selection compounds (natms_per_selcmp is checked against cm.shape[1]).

  • minlength (int, optional) – A minimum number of bins for the output array. The output array will have at least this number of elements, though it will be longer if necessary.

  • dtype (dtype, optional) – Data type of the output array.

Returns:

hist_refcmp_same_selcmp (numpy.ndarray) – Histogram of the number of contacts that reference compounds establish to the same selection compound. Different selection compounds that are connected to the same reference compound via the same number of “bonds” (Atom- Atom contacts) are not taken into account.

See also

mdtools.structure.contact_matrix()

Construct a boolean contact matrix for two MDAnalysis AtomGroups

mdtools.structure.natms_per_cmp()

Get the number of Atoms of each compound in an MDAnalysis AtomGroup

mdtools.structure.contact_hists()

Bin the number of contacts between reference and selection compounds into histograms.

mdtools.structure.contact_hist_refcmp_diff_selcmp()

Bin the number of contacts that reference compounds establish to different selection compounds into a histogram.

mdtools.structure.contact_hist_refcmp_selcmp_tot()

Bin the total number of contacts that reference compounds establish to selection compounds into a histogram.

mdtools.structure.contact_hist_refcmp_selcmp_pair()

Bin the number of “bonds” (Atom- Atom contacts) between pairs of reference and selection compounds

Notes

Atoms belonging to the same compound must form a contiguous set in the input contact matrix, otherwise the result will be wrong.

About the output array:

hist_refcmp_same_selcmp

The first element is the number of reference compounds having no contact with any selection compound, the second element is the number of reference compounds having contact with at least one selection compound via exactly one “bond”, the third element is the number of reference compounds having contact with at least one selection compound via exactly two “bonds”, and so on.

Important: Different selection compounds that are connected to the same reference compound via the same number of “bonds” are not taken into account. For instance, if a reference compound is connected to two different selection compounds via one “bond”, respectively, only the first selection compound is counted. However, if the reference compound is connected to the first selection compound via one “bond” and to the second selection compound via two “bonds”, both selection compounds are counted.

The sum of all histogram elements might therefore exceed the number of reference compounds, because a single reference compound can be connected to different selection compounds with different numbers of “bonds”. However, each histogram element on its own cannot exceed the number of reference compounds, because different selection compounds that are connected to the same reference compound via the same number of “bonds” are not taken into account.

Hence it is e.g. possible to say that 100 % of the reference compounds are coordinated monodentately by selection compounds while at the same time 50 % of the reference compounds are additionally coordinated bidentately.

This behavior is complementary to the histogram returned by mdtools.structure.contact_hist_refcmp_selcmp_pair().

If both natms_per_refcmp and natms_per_selcmp are 1 and cm is a true boolean contact matrix, hist_refcmp_same_selcmp is equal to [x, cm.shape[0]-x], where x is the number of reference compounds having no contact with any selection compound.

Examples

>>> cm = np.tril(np.ones((5,4), dtype=bool), -1)
>>> cm[3, 0] = 0
>>> cm
array([[False, False, False, False],
       [ True, False, False, False],
       [ True,  True, False, False],
       [False,  True,  True, False],
       [ True,  True,  True,  True]])
>>> mdt.strc.contact_hist_refcmp_same_selcmp(cm)
array([1, 4])
>>> mdt.strc.contact_hist_refcmp_same_selcmp(cm=cm, minlength=4)
array([1, 4, 0, 0])
>>> mdt.strc.contact_hist_refcmp_same_selcmp(cm=cm, minlength=1)
array([1, 4])
>>> mdt.strc.contact_hist_refcmp_same_selcmp(cm=cm, dtype=np.uint32)
array([1, 4], dtype=uint32)
>>> hist = mdt.strc.contact_hist_refcmp_same_selcmp(cm)
>>> hist[1] == cm.shape[0] - hist[0]
True
>>> mdt.strc.cmp_contact_count_matrix(
...     cm=cm, natms_per_refcmp=[2, 2, 1]
... )
array([[1, 0, 0, 0],
       [1, 2, 1, 0],
       [1, 1, 1, 1]])
>>> mdt.strc.contact_hist_refcmp_same_selcmp(
...     cm=cm, natms_per_refcmp=[2, 2, 1]
... )
array([0, 3, 1])
>>> mdt.strc.cmp_contact_count_matrix(cm=cm, natms_per_selcmp=2)
array([[0, 0],
       [1, 0],
       [2, 0],
       [1, 1],
       [2, 2]])
>>> mdt.strc.contact_hist_refcmp_same_selcmp(
...     cm=cm, natms_per_selcmp=2
... )
array([1, 2, 2])
>>> mdt.strc.cmp_contact_count_matrix(
...     cm=cm, natms_per_refcmp=[2, 2, 1], natms_per_selcmp=2
... )
array([[1, 0],
       [3, 1],
       [2, 2]])
>>> mdt.strc.contact_hist_refcmp_same_selcmp(
...     cm=cm, natms_per_refcmp=[2, 2, 1], natms_per_selcmp=2
... )
array([0, 2, 1, 1])

Edge cases:

>>> cm = np.array([], dtype=bool).reshape(0, 4)
>>> mdt.strc.contact_hist_refcmp_same_selcmp(
...     cm, natms_per_refcmp=[]
... )
array([], dtype=int64)
>>> mdt.strc.contact_hist_refcmp_same_selcmp(cm, natms_per_refcmp=1)
array([], dtype=int64)
>>> mdt.strc.contact_hist_refcmp_same_selcmp(
...     cm, natms_per_refcmp=[], minlength=2, dtype=np.uint32
... )
array([0, 0], dtype=uint32)
>>> cm = np.array([], dtype=bool).reshape(6, 0)
>>> mdt.strc.contact_hist_refcmp_same_selcmp(
...     cm, natms_per_refcmp=3, natms_per_selcmp=[]
... )
array([2])