Search code examples
pythonnumpyscipyregressioncurve-fitting

scipy curve_fit not working correctly


It seems like it only fits the first parameter.

This went all great when I tried to generate a curve with the curve_fit example but not when I use my own data.

Here is my raw data: https://pastebin.com/0hs2JVXL

Which for the simplicity I am converting to ingeger numbers so here is the data im feeding to curve_fit: https://pastebin.com/2uP0iyTw

This is the curve I'm trying to fit (basically the formula for lognormal distribution with a scale value):

def func(x, k, s, u):
    x=np.array(x)
    return k * (1 / (x * s * np.sqrt(2*np.pi)))  *  np.exp(  - np.power((np.log(x)-u),2)  /  (2*np.power(s , 2)))

And this is how I'm using it:

graphData.append(
    {
        'x': xdata,
        'y': ydata,
        'name': "data",
        'mode': 'lines'
    }
)

popt, pcov = curve_fit(func, xdata, ydata)

graphData.append(
    {
        'x': xdata,
        'y': func(xdata, *popt),
        'name': "" + "[Fit]",
        'mode': 'lines'
    }
)

But this is the output data I'm getting: https://pastebin.com/WjaTH9wQ

And these are the parameters it is setting: k=33.06185171 s= 1. u=1.

And to visualize:

enter image description here

I am not sure at all why this is not working.


Solution

  • Using one of the ideas in the comments I got it to work:

    from scipy.optimize import curve_fit
    import matplotlib.pyplot as pyplot
    import numpy as np
    
    data = np.loadtxt(open("scipycurve.csv", "rb"), delimiter=",", skiprows=1)
    xdata = data[:,0]
    ydata = data[:,1]
        
    def func(x, k, s, u):
        x=np.array(x)
        return k * (1 / (x * s * np.sqrt(2*np.pi)))  *  np.exp(  - np.power((np.log(x)-u),2)  /  (2*np.power(s , 2)))
    
    p0 = [1000,1,10]
    popt, pcov = curve_fit(func, xdata, ydata, p0)
    
    pyplot.figure()
    pyplot.plot(xdata, ydata, label='Data', marker='o')
    pyplot.plot(xdata,  func(xdata, popt[0], popt[1], popt[2]), 'g--')
    pyplot.show()
    
    print (popt)
    

    plot of data and fitted curve

    [ 6.84279941e+07 5.09882839e-01 1.05414859e+01]

    Hope it helps. Just looks like the algorithm needs some help in this case by giving it parameters.