plot_energy_dist
Plot statistics about the distribution of energy terms contained in an .edr file (Gromacs energy file).
For each energy term selected with --observables the following plots are created:
The evolution of the energy term with time.
A histogram showing the distribution of the energy values.
The autocorrelation function (ACF) of the energy term with confidence intervals given by \(1 - \alpha\).
The power spectrum of the energy term, i.e. the absolute square of its discrete Fourier transform.
Additionally, the following characteristics of the distributions of the selected energy terms are written to file:
Number of data points.
Sample mean.
Median of the sample.
Unbiased sample variance.
Minumum value of the sample.
Maximum value of the sample.
Unbiased sample skewness (Fisher-Pearson).
Unbiased excess sample kurtorsis (Fisher).
Biased non-Gaussian parameter.
p-value from D’Agostino’s and Pearson’s test for normality.
Options
- -f
The name of the .edr file to read.
- --plot-out
Output file name for the file that contains the plot.
- --stats-out
Output file name for the file that contains the characteristics of the distributions of the selected energy terms.
- -b
First frame to use from the .edr file. Frame numbering starts at zero. Default:
0
.- -e
Last frame to use from the .edr file. This is exclusive, i.e. the last frame read is actually
END - 1
. A value of-1
means to use the very last frame. Default:-1
.- --every
Use every n-th frame from the .edr file. Default:
1
.- --gzipped
If given, the input file is assumed to be compressed with gzip and will be decompressed before processing. Afterwards, the decompressed file is removed.
- --observables
A space separated list of energy terms to select. The energy terms must be present in the .edr file. If an energy term contains a space, like ‘Kinetic En.’, put it in quotes.
'Time'
is not allowed as selection. Default:["Potential", "Kinetic En.", "Pressure"]
- --print-obs
Only print all energy terms contained in the .edr file and exit.
- --diff
Use the difference between consecutive values of the energy term for the analysis rather than the energy term itself.
- --alpha
Significance level for D’Agostino’s and Pearson’s K-squared test for normality of the distribution of energy values (see
scipy.stats.normaltest()
) and for the confidence intervals of the ACF (seemdtools.statistics.acf_confint()
). The K-squared test requires a sample size of more than 20 data points. Typical values for \(\alpha\) are 0.01 or 0.05. In some cases it is set to 0.1 to reduce the probability of a Type 2 error, i.e. the null hypothesis is not rejected although it is wrong. Here, the null hypothesis is that the data are normally distributed (in case of the K-squared test) or have no autocorrelation (in case of the ACF). For more details about the significance level seemdtools.statistics.acf_confint()
. Default:0.1
- --num-points
Use only the last NUM_POINTS data points when ploting the energy terms vs. time. Must not be negative. If NUM_POINTS is greater then the actual number of available data points or
None
, it is set to the maximum number of available data points. Default:None
See also
scipy.stats.skew()
Compute the sample skewness of a data set
scipy.stats.kurtosis()
Compute the kurtosis of a dataset
scipy.stats.normaltest()
Test whether a sample differs from a normal distribution
mdtools.statistics.ngp()
Compute the non-Gaussian parameter of a data set
mdtools.plot.correlogram()
Create and plot a correlogram for a given data set
Notes
The produced plots and distribution characteristics can be used to judge wether the distribution of energy terms is reasonable for the simulated ensemble.