Search code examples
pythonnumpyscipycurve-fittingcurve

Scipy's curve_fit not giving reasonable result


I have a simple x,y data set to fit, at least at first glance. The issue is that scipy.optimize.curve_fit gives back a very large value for one of the parameters fitted and I don't know if this is mathematically correct or if there's something wrong with how I'm fitting the data.

The figure below shows both the data points and the best fit obtained in blue. The curve used (func in the MWE below) has four parameters a, b, c, d being fitted:

  • a gives approximately the x value where the curve attains it's half-maximum.
  • b represents the x value where the curve stabilizes. This func value is given by the d parameter, ie: func(b) = d
  • c is related to the maximum value of the curve at the origin: func(0) = c*constant + d
  • d is where the curve stabilizes (black line in the figure).

The b parameter is the one I'm having issues with (see end of question), it's also the one I'm most interested in assigning a reasonable value.

enter image description here

The MWE shows the function being fitted and the results:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Function to be fitted.
def func(x, a, b, c, d):
    return c * (1 / np.sqrt(1 + (np.asarray(x) / a) ** 2) -
        1 / np.sqrt(1 + (b / a) ** 2)) ** 2 + d

# Define x,y data.    
x_list = [12.5, 37.5, 62.5, 87.5, 112.5, 137.5, 162.5, 187.5, 212.5, 237.5,
    262.5, 287.5, 312.5, 337.5, 362.5, 387.5, 412.5, 437.5, 462.5, 487.5,
    512.5]
y_list = [0.008, 0.0048, 0.0032, 0.00327, 0.0023, 0.00212, 0.00187,
    0.00086, 0.00070, 0.00100, 0.00056, 0.00076, 0.00052, 0.00077, 0.00067,
    0.00048, 0.00078, 0.00067, 0.00069, 0.00061, 0.00047]

# Initial guess for the 4 parameters.
guess = (50., 200., 80. / 10000., 6. / 10000.)

# Fit curve to x,y data.
f_prof, f_err = curve_fit(func, x_list, y_list, guess)

# Values for the a,b,c,d fitted parameters.
print f_prof

# Errors (standard deviations) for the fitted parameters.
print np.sqrt(f_err[0][0]), np.sqrt(f_err[1][1]), np.sqrt(f_err[2][2]),\
    np.sqrt(f_err[3][3])

# Generate plot.
plt.scatter(x_list, y_list)
plt.plot(x_list, func(x_list, f_prof[0], f_prof[1], f_prof[2], f_prof[3]))
plt.hlines(y=f_prof[3], xmin=0., xmax=max(x_list))
plt.show()

The results I get are:

# a, b, c, d
 52.74, 2.52e+09, 7.46e-03, 5.69e-04

# errors
11.52, 1.53e+16, 0.0028, 0.00042

The b parameter has a huge value and also a huge error. By looking at the data plotted in the figure, one would estimate by eye that the value for b (ie: the x value where the data set stabilizes) should be around x=300. Why am I getting such a large value for b and its error?


Solution

  • I don't know if this is intentional or a mistake but it looks to me like 'b' will be correlated strongly with 'a' and 'd' and has no "interaction" with the independent variable 'x'. If b/a is large enough, you could approximate 1/np.sqrt(1 + (b / a) ** 2)) ** 2 as a/b, so that your function becomes c * function_of(x, a) - a/b + d

    Your 'a' and 'x' values are large enough that this becomes very nearly c*a/x - a/b + d.

    As pointed by behzad.nouri, curve_fit can be slightly unstable compared to other minimizers, and always minimizes the least-squares. But it does return the full covariance matrix including correlations between variables (off-diagonal elements of your f_err). Use these!!

    If you're sure 'b' has a value around 300, or are interested in easily switching between fmin and the levenberg-marquardt algorithm, you might find the lmfit package (http://lmfit.github.io/lmfit-py/) useful. It allows you to put bounds on parameters, easily switch between fitting algorithms, and also to do a more brute-force exploration of the confidence intervals for the parameters.