plot_energy_dist

Plot statistics about the distribution of energy terms contained in an .edr file (Gromacs energy file).

For each energy term selected with --observables the following plots are created:

  • The full evolution of the energy term with time including a centered moving average, the cumulative average and the total average.

  • A cutout of the above plot from --cutout-start to --cutout-stop.

  • A histogram showing the distribution of the energy values.

  • The autocorrelation function (ACF) of the energy term with confidence intervals given by \(1 - \alpha\).

  • The power spectrum of the energy term, i.e. the absolute square of its discrete Fourier transform.

Additionally, the following characteristics of the distributions of the selected energy terms are written to file:

  • Number of data points.

  • Sample mean.

  • Median of the sample.

  • Unbiased sample variance.

  • Minimum value of the sample.

  • Maximum value of the sample.

  • Unbiased sample skewness (Fisher-Pearson).

  • Unbiased excess sample kurtosis (Fisher).

  • Biased non-Gaussian parameter.

  • p-value from D’Agostino’s and Pearson’s test for normality.

Options

-f

The name of the .edr file to read.

--plot-out

Output file name for the file that contains the plots.

--stats-out

Output file name for the file that contains the characteristics of the distributions of the selected energy terms.

-b

First frame to use from the .edr file. Frame numbering starts at zero. Default: 0.

-e

Last frame to use from the .edr file. This is exclusive, i.e. the last frame read is actually END - 1. A value of -1 means to use the very last frame. Default: -1.

--every

Use every n-th frame from the .edr file. Default: 1.

--gzipped

If given, the input file is assumed to be compressed with gzip and will be decompressed before processing. Afterwards, the decompressed file is removed.

--observables

A space separated list of energy terms to select. The energy terms must be present in the .edr file. If an energy term contains a space, like ‘Kinetic En.’, put it in quotes. 'Time' is not allowed as selection. Default: ["Potential", "Kinetic En.", "Total Energy"]

--print-obs

Only print all energy terms contained in the .edr file and exit.

--diff

Use the difference between consecutive values of the energy term for the analysis rather than the energy term itself.

--wlen

Window length for calculating the centered moving average. Should be odd for a real centered moving average. Default: 501.

--cutout-start

First frame to show in the cutout. Frame numbering starts at zero. A value of None means start 500 frames before the end. Default: None.

--cutout-stop

Last frame to show in the cutout. This is inclusive, i.e. the last shown frame is indeed CUTOUT_STOP. A value of None means to use the very last frame. Default: None.

--alpha

Significance level for D’Agostino’s and Pearson’s K-squared test for normality of the distribution of energy values (see scipy.stats.normaltest()) and for the confidence intervals of the ACF (see mdtools.statistics.acf_confint()). The K-squared test requires a sample size of more than 20 data points. Typical values for \(\alpha\) are 0.01 or 0.05. In some cases it is set to 0.1 to reduce the probability of a Type 2 error, i.e. the null hypothesis is not rejected although it is wrong. Here, the null hypothesis is that the data are normally distributed (in case of the K-squared test) or have no autocorrelation (in case of the ACF). For more details about the significance level see mdtools.statistics.acf_confint(). Default: 0.1

See also

scipy.stats.skew()

Compute the sample skewness of a data set

scipy.stats.kurtosis()

Compute the kurtosis of a dataset

scipy.stats.normaltest()

Test whether a sample differs from a normal distribution

mdtools.statistics.ngp()

Compute the non-Gaussian parameter of a data set

mdtools.plot.correlogram()

Create and plot a correlogram for a given data set

Notes

The produced plots and distribution characteristics can be used to judge whether the distribution of energy terms is reasonable for the simulated ensemble.

For instance, in the canonical (\(NVT\)) ensemble, the variance of the total energy \(\sigma_{E_{tot}}^2\) is related to the heat capacity \(C_V\) of the system at constant volume.[1]

\[\sigma_{E_{tot}}^2 = k_B T^2 C_V\]

Here, \(k_B\) is the Boltzmann constant and \(T\) is the temperature. Furthermore, the variance of the kinetic energy \(\sigma_{E_{kin}}^2\) is related to the number of degrees of freedom \(f\) of the system.

\[\sigma_{E_{kin}}^2 = \frac{f}{2} \left( k_B T \right)^2\]

The total number of degrees of freedom of \(N\) molecules in \(d\)-dimensional space is usually \(f = Nd\). However, in molecular dynamics simulations, the total momentum of the system and the length of bonds to hydrogen atoms are typically kept constant. Thus, the number of degrees of freedom reduces by \(d\) (for the total momentum) and the number of constraints \(N_c\), such that \(f\) becomes \(f = d(N-1) - N_c\).

In the \(NpT\) ensemble, the ratio of the variance of the simulation box volume \(\sigma_V^2\) to the average simulation box volume \(\langle V \rangle\) is related to the isothermal compressibility \(\kappa_T\) of the system.[1]

\[\frac{\sigma_V^2}{\langle V \rangle} = k_B T \kappa_T\]

References