I have a 2D array and I am trying to fit a curve on the data. my objective function is a polynomial function:
def objective(x, a, b, c):
return a * x + b * x**2 + c
I used curve_fit
from scipy.optimize
to find the suitable curve for the data. But, I need to know how much this model is good. what is the difference between actual data and estimated curve?
how can I find this? dose curve_fit
use mean square error to find the curve? how can I control this difference?
According to the documentation of curve_fit
, setting the input argument full_output
to True
, the function returns some additional information about the optimization; in particular, the function returns a dictionary (infodict
) with an entry fvec
, that contains the residuals (y - y_star
) evaluated at the solution. Moreover, the default method used for the optimization is the least squares, if applicable (look at the method
argument).
So, if you want to know the error of the fitted function, you can use those information:
from scipy.optimize import curve_fit
import numpy as np
def objective(x, a, b, c):
return a * x + b * x**2 + c
x = np.arange(-10, 10, 1)
data = objective(x, 1, 2, 3) + np.random.normal(0, 10, (len(x),))
potp, pcov, info, msg, ier = curve_fit(objective, x, data, full_output=True, method='lm')
y_hat = objective(x, *potp.tolist())
# residuals are in the entry 'fvec' of the info dict.
# These are the residuals evaluated at the solution, i.e., f(x) - data
# Compute sum of squared residuals
err = np.dot(info['fvec'], info['fvec'])
This is the plot representing the original data and the interpolation in the previous code
Moreover, the documentation states that curve_fit
uses leastsq as optimization method if bounds are not provided; in particular, it tries to minimize the sum of squares of an error function. If bounds are provided or the number of observations is less than the number of variables, curve_fit
uses least_squares; least_square
solves a bounded nonlinear least-squares problem.