Search code examples
pythonscipylinear-regressionsympycurve-fitting

How to generalise fitting function to allow sciPy curve fit to infer the number of inputs


I have a program which will generate a Sympy lambdify expression (symbolic function) with a variable number of unknown variables. I am trying to fit to this function using the curve_fit utility without explicitly passing the exact number of variables to the function.

import numpy as np
import matplotlib as mpl
mpl.use('TkAgg')
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import integrate
import sympy as sy


def step_function(t, start, end):
    return 1 * ((t >= start) & (t < end))


def gen_data(x_data, model, gradients, offset=0, noise=0.1):  # generate some fake data
    y = [model(x, offset, gradients[0], gradients[1], gradients[2]) for x in x_data]
    y = np.asarray(y, dtype=np.float32)
    if noise:
        y += np.random.normal(0, 1, 100)
    return y


t, a = sy.symbols('t, a')
grad_symbols = sy.symbols('b, c, d')
x = np.arange(0, 100)
gradient_changes = [0, 30, 60, 100]
gradients = [1, 2, -4]
grad = []

for t1, t2, LFD in zip(iter(gradient_changes[::1]), iter(gradient_changes[1::1]), iter(grad_symbols)):
    grad.append(((t - t1) * LFD) * sy.SingularityFunction(t, t1, gradient_changes[-1]))

data_model = sy.lambdify((t, a) + grad_symbols,
                          a + sum(grad), ({'SingularityFunction': step_function}, 'numpy'))

y = gen_data(x, data_model, gradients, noise=0.1)

fig, ax = plt.subplots()  # Create a figure containing a single axes.
ax.plot(x, y, label='Data')

popt, pcov = curve_fit(data_model, x, y)

fit_line = [data_model(time, popt[0], popt[1], popt[2], popt[3]) for time in x]
ax.plot(x, fit_line, label='Fit')

So far so good, curve_fit can detect the number of parameters in the lambdify expression, but I actually need to fit to an integral of the function about each point as the data is not instantaneous.

Is it possible to use a starred expression like

def fit_func(x, *inputs):
    # find the average value of the model about each point
    return [integrate.quad(data_model, _x - 1 / 2, _x + 1 / 2, args=inputs)[0] for _x in x]

and use curve_fit like so

popt, pcov = curve_fit(fit_func, x, y)

At the moment I get the error ValueError: Unable to determine number of fit parameters.


Solution

  • Given that SciPy is able to figure the number of parameters when data_model is passed to curve_fit() directly, one way to figure out how to do this is to look at how SciPy does this. The handy thing about SciPy is that for every function in the docs, it gives a link to the code which implements it. Here's the relevant sections of code:

    We can take the code which is relevant to our problem, and reduce it to this:

    import inspect
    n = len(inspect.signature(data_model).parameters)
    p0 = np.ones(n - 1)  # Exclude t parameter
    popt, pcov = curve_fit(fit_func, x, y, p0=p0)
    

    This looks at the function that fit_func() is calling, and finds how many parameters it has. Then, it subtracts one because curve_fit() will provide the x parameter to your function. It provides an array of all ones, because that is what curve_fit() does if p0 is not provided.