Search code examples
pythonpython-3.xoptimizationscipycurve-fitting

how do I find a curve fit model is good for the data?


I have a 2D array and I am trying to fit a curve on the data. my objective function is a polynomial function:

def objective(x, a, b, c):
    return a * x + b * x**2 + c

I used curve_fit from scipy.optimize to find the suitable curve for the data. But, I need to know how much this model is good. what is the difference between actual data and estimated curve? how can I find this? dose curve_fit use mean square error to find the curve? how can I control this difference?


Solution

  • According to the documentation of curve_fit, setting the input argument full_output to True, the function returns some additional information about the optimization; in particular, the function returns a dictionary (infodict) with an entry fvec, that contains the residuals (y - y_star) evaluated at the solution. Moreover, the default method used for the optimization is the least squares, if applicable (look at the method argument).

    So, if you want to know the error of the fitted function, you can use those information:

    from scipy.optimize import curve_fit
    import numpy as np
    
    def objective(x, a, b, c):
        return a * x + b * x**2 + c
    
    x = np.arange(-10, 10, 1)
    data = objective(x, 1, 2, 3) + np.random.normal(0, 10, (len(x),))
    
    potp, pcov, info, msg, ier = curve_fit(objective, x, data, full_output=True,  method='lm')
    y_hat = objective(x, *potp.tolist())
    
    # residuals are in the entry 'fvec' of the info dict.
    # These are the residuals evaluated at the solution, i.e., f(x) - data
    # Compute sum of squared residuals
    err = np.dot(info['fvec'], info['fvec'])
    

    This is the plot representing the original data and the interpolation in the previous code

    enter image description here

    Moreover, the documentation states that curve_fit uses leastsq as optimization method if bounds are not provided; in particular, it tries to minimize the sum of squares of an error function. If bounds are provided or the number of observations is less than the number of variables, curve_fit uses least_squares; least_square solves a bounded nonlinear least-squares problem.