plot_energy_dist

Plot statistics about the distribution of energy terms contained in an .edr file (Gromacs energy file).

For each energy term selected with --observables the following plots are created:

  • The evolution of the energy term with time.

  • A histogram showing the distribution of the energy values.

  • The autocorrelation function (ACF) of the energy term with confidence intervals given by \(1 - \alpha\).

  • The power spectrum of the energy term, i.e. the absolute square of its discrete Fourier transform.

Additionally, the following characteristics of the distributions of the selected energy terms are written to file:

  • Number of data points.

  • Sample mean.

  • Median of the sample.

  • Unbiased sample variance.

  • Minumum value of the sample.

  • Maximum value of the sample.

  • Unbiased sample skewness (Fisher-Pearson).

  • Unbiased excess sample kurtorsis (Fisher).

  • Biased non-Gaussian parameter.

  • p-value from D’Agostino’s and Pearson’s test for normality.

Options

-f

The name of the .edr file to read.

--plot-out

Output file name for the file that contains the plot.

--stats-out

Output file name for the file that contains the characteristics of the distributions of the selected energy terms.

-b

First frame to use from the .edr file. Frame numbering starts at zero. Default: 0.

-e

Last frame to use from the .edr file. This is exclusive, i.e. the last frame read is actually END - 1. A value of -1 means to use the very last frame. Default: -1.

--every

Use every n-th frame from the .edr file. Default: 1.

--gzipped

If given, the input file is assumed to be compressed with gzip and will be decompressed before processing. Afterwards, the decompressed file is removed.

--observables

A space separated list of energy terms to select. The energy terms must be present in the .edr file. If an energy term contains a space, like ‘Kinetic En.’, put it in quotes. 'Time' is not allowed as selection. Default: ["Potential", "Kinetic En.", "Pressure"]

--print-obs

Only print all energy terms contained in the .edr file and exit.

--diff

Use the difference between consecutive values of the energy term for the analysis rather than the energy term itself.

--alpha

Significance level for D’Agostino’s and Pearson’s K-squared test for normality of the distribution of energy values (see scipy.stats.normaltest()) and for the confidence intervals of the ACF (see mdtools.statistics.acf_confint()). The K-squared test requires a sample size of more than 20 data points. Typical values for \(\alpha\) are 0.01 or 0.05. In some cases it is set to 0.1 to reduce the probability of a Type 2 error, i.e. the null hypothesis is not rejected although it is wrong. Here, the null hypothesis is that the data are normally distributed (in case of the K-squared test) or have no autocorrelation (in case of the ACF). For more details about the significance level see mdtools.statistics.acf_confint(). Default: 0.1

--num-points

Use only the last NUM_POINTS data points when ploting the energy terms vs. time. Must not be negative. If NUM_POINTS is greater then the actual number of available data points or None, it is set to the maximum number of available data points. Default: None

See also

scipy.stats.skew()

Compute the sample skewness of a data set

scipy.stats.kurtosis()

Compute the kurtosis of a dataset

scipy.stats.normaltest()

Test whether a sample differs from a normal distribution

mdtools.statistics.ngp()

Compute the non-Gaussian parameter of a data set

mdtools.plot.correlogram()

Create and plot a correlogram for a given data set

Notes

The produced plots and distribution characteristics can be used to judge wether the distribution of energy terms is reasonable for the simulated ensemble.