python numpy scipy mathematical-optimization curve-fitting

scipy curve_fit fails on easy linear fit?

I intend to do a simple linear fit with two numpy arrays y1 and y2 of length 54 each. The function is defined as follows:

def f(x,b):
    return b*x

The data are plotted here:

Then I tried fitting via:

popt, pcov = scop.curve_fit(f,y2,y1)  # yes y1 and y2 are in right order

Result: popt = 1., pcov = inf

I tried with p0 = -833, which is more or less what the result should be, but it gives me popt = -833, pcov = inf.

I tried some sample data with a sample function:

x = np.array(range(10))
y = x**2 + 3
def fu(x,b):
    return x**2 + b
po, pc = scop.curve_fit(fu,x,y)
print po, pc

Result is just fine: 3 and 2e-33

Does anybody have an idea as to what went wrong with the first trial? I haven't found anything useful or connected to my problem yet...

Solution

NaN values will produce meaningless results - you need to exclude them from your data before doing any fitting. You use boolean indexing to do this:

valid = ~(np.isnan(y1) | np.isnan(y2))
popt, pcov = scop.curve_fit(f, y2[valid], y1[valid])

As mentioned in the comments, in versions of scipy newer than 0.15.0 curve_fit will automatically check for NaNs and Infs in your input arrays and will raise a ValueError if they are found. This behavior can be optionally disabled using the check_finite parameter.

Based on your question and comments, I'm assuming you must be using an older version - you should probably consider upgrading.