Search code examples
pythonscipycurve-fitting

scipy curve_fit does not find the best fit


I'd like to find out whether there is a peak in my time series or not. To do so, I'm trying to fit the data points to a gauss curve. It works well for thousands of my samples:

good fit

but few are not fitter properly despite there is an obvious peak: wrong fit

(see the very low peak with the highest point around 0.03)

Here's the code:

def gauss(x, a, x0, sigma):
    return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

param, covariance = curve_fit(gauss, x, y)

I've noticed that the magnitude of the y values plays a role in the fitting so I've rescaled the data into <0, 100> interval. This helped but did not solve all the cases. Is there anything else I can do to improve the fitting? Different initialization? Smaller optimization step?

Here are some facts about the data:

  • Every sample has 3-20 data points.
  • The peak (if there is any) must have its highest point inside the span.
  • x axis is from 0 to 20
  • y axis is from 0 to 100

I've browse through other similar questions at stackoverflow but did not find a solution to my problem.

If anybody knows a better solution to determine whether there is a peak in the time series or not, I'd appreciate to hear it in the comments. Anyway, I'd like to know why some curves are not fitted properly.


Solution

  • The curve_fit method provides an option to define the initial function parameters. Choosing the initial height and position of the bell curve to the y and x coordinates of the "highest" point in the series works very well on my data.

    Additionally, curve_fit also allows to set bounds of the function parameters to assure the height of the bell curve does not go below (or above) certain level.

    Following code works for all of my data samples:

    init = [max(y), np.argmax(y), 1.5]
    bounds = ([50, len(x) * 0.05, -20], [150, (len(x) - 1) * 0.95, len(x) / 2])
    param, covariance = curve_fit(gauss, x, y, p0=init, bounds=bounds)
    

    If the parameters are outside the bounds, I catch the exception (ValueError) and assume there is no peak in the time series.