find_linear_region

mdtools.numpy_helper_functions.find_linear_region(data, window_length, polyorder=2, delta=1, tol=0.1, axis=-1, correct_intermittency=None, visualize=False)[source]

Find linear regions in a data array. This is done by applying a Savitzky-Golay differentiation filter to the array using scipy.signal.savgol_filter(). The linear regions are identified by checking where the second derivative is zero within a given tolerance. The sample points of the data must be equidistant. For a brief introduction to Savitzky-Golay filters see for instance https://eigenvector.com/wp-content/uploads/2020/01/SavitzkyGolay.pdf

Parameters:
  • data (array_like) – The data in which to identify linear regions.

  • window_length (int) – The length of the filter window. window_length must be a positive odd integer and it must be less than or equal to the size of data. Note that the window size should be smaller than the smallest characteristic you want to resolve. Otherwise these characteristics will be smoothed out. Keep in mind that the purpose of smoothing is to get rid of random noise while (ideally) preserving the true signal.

  • polyorder (int, optional) – The order of the polynomial used to fit the samples. polyorder must be less than window_length but at least two. Keep in mind that you are fitting window_length data points with a polynomial of order polyorder. Thus, in order to avoid overfitting, polyorder should in fact be considerably smaller than window_length. If polyorder is too high, no smoothing effect will be seen. polyorder must be at least two, because otherwise the second derivative of the polynomial will always be zero.

  • delta (float, optional) – The spacing of the samples to which the filter will be applied.

  • tol (float, optional) – Tolerance within which the second derivative is considered to be zero.

  • axis (int, optional) – The axis of the array data along which the filter is to be applied. Default is the last axis.

  • correct_intermittency (int, optional) – Apply mdtools.dynamics.correct_intermittency_1d() to the result array. A value smaller than or equal to zero will not change the result. A value higher than zero can be used to eliminate numerical fluctuations in the result. E.g. if the result array is T,T,F,T,T it will be changed to T,T,T,T,T if correct_intermittency is one or higher. The default for correct_intermittency is window_length//2. See mdtools.dynamics.correct_intermittency_1d() for more details.

  • visualize (bool, optional) – Visualize the result of applying the Savitzky-Golay filter to the data. This should only be used in interactive algorithms, since the plot window must be manually closed by the user.

Returns:

linear (numpy.ndarray) – Boolean array. A True element indicates that the second derivative of data is zero within the given tolerance at the respective position. Therefore, the data can be considered linear at this position.