I am trying to fit a polynomial function of arbitrary length to some particle data, and I have noticed that curve_fit() performs much better when the arguments of the fitting function are stated explicitly rather than an undefined number of arguments being supplied, i.e
import scipy.optimize as optimize
def fit(x, a, b):
return a + b*x
my_fit = optimize.curve_fit(fit, x_data, y_data)
performs much better than
import scipy.optimize as optimize
import numpy as np
def fit(x, *args):
return np.sum([arg * x**i for i, arg in enumerate(args)])
my_fit = optimize.curve_fit(fit, x_data, y_data, p0=[1, 1])
Here is a comparison between the fits when my own data is used:
Fitting with *args and explicit coefficients comparison
Does anyone have any explanation for this behaviour?
It works actually the same, the problem is that the second definition of fit does not work for numpy arrays, since np.sum
will always result in a single number. Just specify the axis and it will work:
import scipy.optimize as optimize
def fit(x, a, b):
return a + b*x
def fit2(x, *args):
return np.sum([arg * x**i for i, arg in enumerate(args)], axis=0)
x_data = np.linspace(-0.3, 0.3, 200)
y_data = x_data * 1000 + np.random.normal(size=x_data.shape[0], scale=20)
my_fit = optimize.curve_fit(fit, x_data, y_data, p0=[1, 1])
my_fit2 = optimize.curve_fit(fit2, x_data, y_data, p0=[1, 1])
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
ax1.scatter(x_data, y_data, alpha=0.5, color='red')
ax1.plot(x_data, fit(x_data, my_fit[0][0], my_fit[0][1]), color='k')
ax2.scatter(x_data, y_data, alpha=0.5, color='red')
ax2.plot(x_data, fit2(x_data, my_fit2[0][0], my_fit2[0][1]), color='k')