this is my first time using scipy.optimize.fmin() and I'm not sure if I've written my function correctly. I'm getting this error:
ValueError: Unable to coerce to Series, length must be 11: given 0
I've tried reading the few other posts about this error, but I can't find where I'm going wrong.
The final product that I need are the three coefficients (a, b, c) in this equation:
(a + (b * np.log10(x))) * ((x - c)/x)
where x are the values in a column in a pandas dataframe, and the equation is a curve that I'm fitting to that data.
I've written a function that subtracts the computed (fitted) value of that equation at a given point, from the experimental value at that point, and then returns the sum of their squares:
def fit(guesses, df):
# Guesses is a list of the three coefficients
# Requires that the dataframe has a column named 'x' and a column named 'y'
a, b, c = guesses
y_model = np.array([((a + (b * np.log10(x))) * ((x - c)/x)) - y for x, y in zip(df['x'], df['y'])])
squares = np.square(y_model)
squared_sum = sum(squares)
return squared_sum
Then I run something like this (df is already defined as a pandas dataframe with 8 rows, in this case, and several columns including x and y):
a=1
b=2
c=3
guesses = [a,b,c]
from scipy import optimize
optimize.fmin(fit, guesses, args=(df))
I am thinking that somewhere, something is supposed to be an array (or not an array?), but I can't find it.
Edit: I can't share the real data, but the dataframe has 8 rows, and x and y have values in each row. The values for x are on the order of 10^-7 and the values for y are on the order of 10^-1. If it matters, both have 6 sig figs.
Looking closer, I realized what the issue is. args
expects a tuple, but just passing args=(df)
does not produce a tuple. Instead, you need to do args=(df,)
.
res = fmin(fit, guesses, args=(df,))
Some unsolicited advice: You can vectorize the operations using the pandas series or by converting the data to numpy arrays. In this case it won't make a difference because you have very little data, but if you had more then the difference could be significant.
def fit(guesses, df):
a, b, c = guesses
x = df["x"]
y = df["y"]
# to numpy
#x = df["x"].to_numpy()
#y = df["y"].to_numpy()
y_model = ((a + (b*np.log10(x)))*((x - c)/x)) - y
return (y_model**2).sum()