python scipy-optimize scipy-optimize-minimize

scipy.optimize.fmin "ValueError: Unable to coerce to Series"

this is my first time using scipy.optimize.fmin() and I'm not sure if I've written my function correctly. I'm getting this error:

ValueError: Unable to coerce to Series, length must be 11: given 0

I've tried reading the few other posts about this error, but I can't find where I'm going wrong.

The final product that I need are the three coefficients (a, b, c) in this equation: (a + (b * np.log10(x))) * ((x - c)/x) where x are the values in a column in a pandas dataframe, and the equation is a curve that I'm fitting to that data.

I've written a function that subtracts the computed (fitted) value of that equation at a given point, from the experimental value at that point, and then returns the sum of their squares:

def fit(guesses, df):
    # Guesses is a list of the three coefficients
    # Requires that the dataframe has a column named 'x' and a column named 'y'
    a, b, c = guesses
    y_model = np.array([((a + (b * np.log10(x))) * ((x - c)/x)) - y for x, y in zip(df['x'], df['y'])])
    squares = np.square(y_model)
    squared_sum = sum(squares)
    return squared_sum

Then I run something like this (df is already defined as a pandas dataframe with 8 rows, in this case, and several columns including x and y):

a=1
b=2
c=3
guesses = [a,b,c]

from scipy import optimize
optimize.fmin(fit, guesses, args=(df))

I am thinking that somewhere, something is supposed to be an array (or not an array?), but I can't find it.

Edit: I can't share the real data, but the dataframe has 8 rows, and x and y have values in each row. The values for x are on the order of 10^-7 and the values for y are on the order of 10^-1. If it matters, both have 6 sig figs.

Solution

Looking closer, I realized what the issue is. args expects a tuple, but just passing args=(df) does not produce a tuple. Instead, you need to do args=(df,).

res = fmin(fit, guesses, args=(df,))

Some unsolicited advice: You can vectorize the operations using the pandas series or by converting the data to numpy arrays. In this case it won't make a difference because you have very little data, but if you had more then the difference could be significant.

def fit(guesses, df):
    a, b, c = guesses
    x = df["x"]
    y = df["y"]
    # to numpy
    #x = df["x"].to_numpy()
    #y = df["y"].to_numpy()
    y_model = ((a + (b*np.log10(x)))*((x - c)/x)) - y
    return (y_model**2).sum()