Search code examples
pythonscipyphysicscurve-fitting

Scipy curve_fit returns only ones fot parameters with infinite variance


In this part of a bigger project of mine there has been occurring the same weird thing over and over.

I have multiple datasets from a physisc experiment. One example dataset I'm trying to fit a gaussian inthe 0th and 1st peak in order to get the exact peak distance.

In this shorter code I just copied the sliced arrays I'm inserting into curve_fit. The problem occurs with all my datasets though.

from scipy.optimize import curve_fit
import numpy as np

y0 = np.array([[-3.0500e+01, -2.9500e+01, -2.8500e+01, -2.7500e+01, -2.6500e+01, -2.5500e+01,
  -2.4500e+01, -2.3500e+01, -2.2500e+01, -2.1500e+01, -2.0500e+01, -1.9500e+01,
  -1.8500e+01, -1.7500e+01, -1.6500e+01, -1.5500e+01, -1.4500e+01, -1.3500e+01,
  -1.2500e+01, -1.1500e+01, -1.0500e+01, -9.5000e+00, -8.5000e+00, -7.5000e+00,
  -6.5000e+00, -5.5000e+00, -4.5000e+00, -3.5000e+00, -2.5000e+00, -1.5000e+00,
  -5.0000e-01,  5.0000e-01,  1.5000e+00,  2.5000e+00,  3.5000e+00,  4.5000e+00,
   5.5000e+00,  6.5000e+00,  7.5000e+00,  8.5000e+00,  9.5000e+00,  1.0500e+01,
   1.1500e+01,  1.2500e+01,  1.3500e+01,  1.4500e+01,  1.5500e+01,  1.6500e+01,
   1.7500e+01,  1.8500e+01,  1.9500e+01,  2.0500e+01,  2.1500e+01,  2.2500e+01,
   2.3500e+01,  2.4500e+01,  2.5500e+01,  2.6500e+01,  2.7500e+01,  2.8500e+01,
   2.9500e+01,  3.0500e+01,  3.1500e+01,  3.2500e+01],
 [ 7.8000e+02,  8.6600e+02,  9.6400e+02,  1.1860e+03,  1.3120e+03,  1.4920e+03,
   1.8030e+03,  2.0650e+03,  2.3800e+03,  2.6140e+03,  2.9800e+03,  3.3650e+03,
   3.7640e+03,  4.4290e+03,  4.9220e+03,  5.4010e+03,  6.0790e+03,  6.5280e+03,
   7.4030e+03,  7.9350e+03,  8.8330e+03,  9.6640e+03,  1.0610e+04,  1.1162e+04,
   1.2003e+04,  1.2321e+04,  1.3213e+04,  1.3647e+04,  1.4335e+04,  1.4561e+04,
   1.5014e+04,  1.4779e+04,  1.5083e+04,  1.4826e+04,  1.4431e+04,  1.4119e+04,
   1.3584e+04,  1.2740e+04,  1.2578e+04,  1.1606e+04,  1.0687e+04,  9.9260e+03,
   9.2690e+03,  8.3630e+03,  7.6900e+03,  6.7250e+03,  6.2370e+03,  5.4060e+03,
   4.8850e+03,  4.3590e+03,  3.9200e+03,  3.3580e+03,  2.8770e+03,  2.5370e+03,
   2.3460e+03,  1.9790e+03,  1.7370e+03,  1.5580e+03,  1.3470e+03,  1.2030e+03,
   1.1570e+03,  1.1080e+03,  9.7900e+02,  9.9100e+02]])
y1 =np.array([[   34.5,35.5,36.5, 37.5,38.5, 39.5, 40.5, 41.5, 42.5,     43.5,44.5,45.5, 46.5,47.5, 48.5, 49.5, 50.5, 51.5,     52.5,53.5,54.5, 55.5,56.5, 57.5, 58.5, 59.5, 60.5,     61.5,62.5,63.5, 64.5,65.5, 66.5, 67.5, 68.5, 69.5,     70.5,71.5,72.5, 73.5,74.5, 75.5, 76.5, 77.5, 78.5,     79.5,80.5,81.5, 82.5,83.5, 84.5, 85.5, 86.5, 87.5,     88.5,89.5,90.5, 91.5,92.5, 93.5, 94.5, 95.5, 96.5,     97.5],
 [ 1081.,  1119.,   1242.,   1370.,   1463.,   1639.,   1853.,   2024.,   2289.,   2498.,  2778.,   3300.,   3668.,   3968.,   4362.,   4821.,   5540.,   5879.,   6525.,  6983.,   7458.,   8119.,   8535.,   9181.,   9539.,  10233.,  10510.,  10959., 11092.,  11304.,  11466.,  11455.,  11468.,  11600.,  11095.,  10835.,  10443., 10170.,   9602.,   9249.,   8685.,   8153.,   7490.,   6924.,   6510.,   5902.,  5200.,   4861.,   4428.,   3943.,   3502.,   3179.,   2816.,   2399.,   2178.,  1922.,   1661.,   1486.,   1350.,   1147.,   1043.,    963.,    907.,    824. ]])


def gaussian(x, mu, sigma, h):
    a = h/(sigma*np.sqrt(2*np.pi))
    b = np.exp(-(x-mu)**2/(2*sigma**2))
    return a*b

a, b = curve_fit(gaussian, *y0)
a2, b2 = curve_fit(gaussian, *y1)

print(a)
print(a2)

The results are:

[6.71688722e-01 1.15844848e+01 4.24879860e+05]  # good fit for gaussian in peak 0
[1. 1. 1.] # bad fit for gaussian in peak 1 with completely wrong position

No matter how I set the lower and upper limits for peak 1 it's always return very bad parameters. I also tried setting bounds for mu close to the index returned by find_peaks, didn't work. I also tried giving an inital_guess (for mu the index for peak 1 found with find_peaks and for the other two parameters the values from the gaussian of peak0) for the parameters.

It'd be great if someone could help me. If any more info is needed, which probably is the case, just tell me what you need. I can also send to complete source code, just not sure if that would be to long for this post.


Solution

  • The initial guess provided to curve_fit can significantly impact the fitting process, especially if the data is not well-scaled. If the curve_fit algorithm starts with a poor initial guess, it might converge to a non-optimal solution. Even though you’ve mentioned trying different initial guesses, you could try normalizing your data to see if it improves the fitting process.

    This should be able to fix your fitting problem:

    # Example of using the 'trf' method with normalized data
    def gaussian(x, mu, sigma, h):
        a = h/(sigma*np.sqrt(2*np.pi))
        b = np.exp(-(x-mu)**2/(2*sigma**2))
        return a*b
    
    # Normalize your y data
    y1[1] = (y1[1] - np.min(y1[1])) / (np.max(y1[1]) - np.min(y1[1]))
    
    # Try fitting with different method and adjusted bounds
    p0 = [np.mean(y1[0]), np.std(y1[0]), np.max(y1[1])]  # Initial guess
    bounds = ([np.min(y1[0]), 0, 0], [np.max(y1[0]), np.inf, np.inf])  # Reasonable bounds
    
    a2, b2 = curve_fit(gaussian, *y1, p0=p0, bounds=bounds, method='trf')
    
    print(a2)
    

    The data in y1 might not be well-suited for fitting with the Gaussian model you’re using, or there could be other underlying peaks or noise that are affecting the fit.