plot_energy_dist

Plot statistics about the distribution of energy terms contained in an .edr file (Gromacs energy file).

For each energy term selected with --observables the following plots are created:

  • The full evolution of the energy term with time including the cumulative average and the centered moving average.

  • A cutout of the above plot for the last --num-points data points.

  • A histogram showing the distribution of the energy values.

  • The autocorrelation function (ACF) of the energy term with confidence intervals given by \(1 - \alpha\).

  • The power spectrum of the energy term, i.e. the absolute square of its discrete Fourier transform.

Additionally, the following characteristics of the distributions of the selected energy terms are written to file:

  • Number of data points.

  • Sample mean.

  • Median of the sample.

  • Unbiased sample variance.

  • Minimum value of the sample.

  • Maximum value of the sample.

  • Unbiased sample skewness (Fisher-Pearson).

  • Unbiased excess sample kurtosis (Fisher).

  • Biased non-Gaussian parameter.

  • p-value from D’Agostino’s and Pearson’s test for normality.

Options

-f

The name of the .edr file to read.

--plot-out

Output file name for the file that contains the plots.

--stats-out

Output file name for the file that contains the characteristics of the distributions of the selected energy terms.

-b

First frame to use from the .edr file. Frame numbering starts at zero. Default: 0.

-e

Last frame to use from the .edr file. This is exclusive, i.e. the last frame read is actually END - 1. A value of -1 means to use the very last frame. Default: -1.

--every

Use every n-th frame from the .edr file. Default: 1.

--gzipped

If given, the input file is assumed to be compressed with gzip and will be decompressed before processing. Afterwards, the decompressed file is removed.

--observables

A space separated list of energy terms to select. The energy terms must be present in the .edr file. If an energy term contains a space, like ‘Kinetic En.’, put it in quotes. 'Time' is not allowed as selection. Default: ["Potential", "Kinetic En.", "Total Energy"]

--print-obs

Only print all energy terms contained in the .edr file and exit.

--diff

Use the difference between consecutive values of the energy term for the analysis rather than the energy term itself.

--wlen

Window length for calculating the centered moving average. Should be odd for a real centered moving average. Default: 501.

--num-points

The cutout plot show the last NUM_POINTS data points of the selected energy term(s) vs. time. Must not be negative. If NUM_POINTS is greater then the actual number of available data points or None, it is set to the maximum number of available data points. Default: 500.

--alpha

Significance level for D’Agostino’s and Pearson’s K-squared test for normality of the distribution of energy values (see scipy.stats.normaltest()) and for the confidence intervals of the ACF (see mdtools.statistics.acf_confint()). The K-squared test requires a sample size of more than 20 data points. Typical values for \(\alpha\) are 0.01 or 0.05. In some cases it is set to 0.1 to reduce the probability of a Type 2 error, i.e. the null hypothesis is not rejected although it is wrong. Here, the null hypothesis is that the data are normally distributed (in case of the K-squared test) or have no autocorrelation (in case of the ACF). For more details about the significance level see mdtools.statistics.acf_confint(). Default: 0.1

See also

scipy.stats.skew()

Compute the sample skewness of a data set

scipy.stats.kurtosis()

Compute the kurtosis of a dataset

scipy.stats.normaltest()

Test whether a sample differs from a normal distribution

mdtools.statistics.ngp()

Compute the non-Gaussian parameter of a data set

mdtools.plot.correlogram()

Create and plot a correlogram for a given data set

Notes

The produced plots and distribution characteristics can be used to judge wether the distribution of energy terms is reasonable for the simulated ensemble.