Search code examples
pythonscipycurve-fittingdata-fittingscipy-optimize

Why does scipy.optimize.curve_fit() perform better with a defined number of aruments in fitting function?


I am trying to fit a polynomial function of arbitrary length to some particle data, and I have noticed that curve_fit() performs much better when the arguments of the fitting function are stated explicitly rather than an undefined number of arguments being supplied, i.e

import scipy.optimize as optimize


def fit(x, a, b):
    return a + b*x

my_fit = optimize.curve_fit(fit, x_data, y_data)

performs much better than

import scipy.optimize as optimize
import numpy as np


def fit(x, *args):
    return np.sum([arg * x**i for i, arg in enumerate(args)])

my_fit = optimize.curve_fit(fit, x_data, y_data, p0=[1, 1])

Here is a comparison between the fits when my own data is used:

Fitting with *args and explicit coefficients comparison

Does anyone have any explanation for this behaviour?


Solution

  • It works actually the same, the problem is that the second definition of fit does not work for numpy arrays, since np.sum will always result in a single number. Just specify the axis and it will work:

    import scipy.optimize as optimize
    
    def fit(x, a, b):
        return a + b*x
    
    def fit2(x, *args):
        return np.sum([arg * x**i for i, arg in enumerate(args)], axis=0)
    
    
    x_data = np.linspace(-0.3, 0.3, 200)
    y_data = x_data * 1000 + np.random.normal(size=x_data.shape[0], scale=20)
    
    my_fit = optimize.curve_fit(fit, x_data, y_data, p0=[1, 1])
    my_fit2 = optimize.curve_fit(fit2, x_data, y_data, p0=[1, 1])
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    ax1.scatter(x_data, y_data, alpha=0.5, color='red')
    ax1.plot(x_data, fit(x_data, my_fit[0][0], my_fit[0][1]), color='k')
    ax2.scatter(x_data, y_data, alpha=0.5, color='red')
    ax2.plot(x_data, fit2(x_data, my_fit2[0][0], my_fit2[0][1]), color='k')
    

    enter image description here