Search code examples
python-3.xcurve-fittingscipy-optimizedata-fittingsigmoid

Curve fit and extrapolate for sigmoid function in Python


I have the below dataset:

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

## Given datapoints
xdata = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26])
ydata = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.99330715, 0.98201379, 0.95257413, 0.88079708, 0.73105858, 0.5])


## Plot the data
plt.plot(xdata, ydata)
plt.xlabel('x')
plt.ylabel('sigmoid(x)')
plt.xlim([-1,31])
plt.ylim(0, 1.05)
plt.show()

The above data looks as such:

enter image description here

A curve needs to be caliberated and extrapolated for y decreasing from 1 to 0 by using curve_fit in python.

I am trying to use sigmoid function provided that 'y' is given and 'x' need to be found.

y=1/(1+e^(k(x-x_0)) )

The sigmoid function to fit 'x' is thus defined as such:

## Define sigmoid function to fit xdata
def sigmoid(y, x0, k):
    x = x0 + ((1/k)*(np.log((1/y)-1)))
    return x

## Initial guess
p0 = [np.median(xdata),       # x0
      0.1]                    # k

## Initialize curve fit
popt, pcov = curve_fit(sigmoid,
                       ydata,
                       xdata)


## Define values for y
y = np.arange(1,0,-0.001)

## Evaluate values for x
x = sigmoid(y, *popt)

## Plot tbe actual and fit data
plt.plot(xdata, ydata, 'o', label='data')
plt.plot(x,y, label='fit')
plt.xlim([-10,31])
plt.ylim(0, 1.05)
plt.legend(loc='best')
plt.show()

The fit data looks as such:

enter image description here

It is quite visible that the fit is not good.

Can somebody please let me know how I can fit a curve close to actual data?


Solution

  • You might find lmfit useful for this (disclaimer: I am a lead author), as it has sigmoidal step functions built in and can readily do fits and use those results to interpolate or extrapolate. It also gives more useful reports of results than curve_fit, with variable parameters that have meaningful names and uncertainties and correlations correctly calculated, sorted and assigned.

    Your example might look like this:

    import numpy as np
    import matplotlib.pyplot as plt
    
    from lmfit.models import StepModel, ConstantModel
    
    #
    xdata = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26])
    ydata = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.99330715, 0.98201379, 0.95257413, 0.88079708, 0.73105858, 0.5])
    
    # create a logistic step function + an offset constant
    model = StepModel(form='logistic') + ConstantModel()
    
    # make a set of parameters with initial values
    params = model.make_params(c=1, amplitude=-1, center=25, sigma=2.5)
    
    # fit the data with the parameters and `x` independent variable
    result = model.fit(ydata, params, x=xdata)
    
    # print results
    print(result.fit_report())
    
    # evaluate the model with best-fit parameters to 
    # interpolate and extrapolate to higher x values
    xnew = np.linspace(15, 45, 121)
    ynew = model.eval(result.params, x=xnew)
    
    
    ## Plot the data, best-fit result, and extrapolated data
    plt.plot(xdata, ydata, 'o', label='data')
    plt.plot(xdata, result.best_fit, '-', label='best fit')
    plt.plot(xnew,  ynew, '+', label='exptrapolated')
    plt.xlabel('x')
    plt.ylabel('sigmoid(x)')
    plt.xlim([-1,41])
    plt.ylim(0, 1.05)
    plt.legend()
    plt.show()
    

    which would print a report of

    [[Model]]
        (Model(step, form='logistic') + Model(constant))
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 82
        # data points      = 27
        # variables        = 4
        chi-square         = 6.4177e-06
        reduced chi-square = 2.7903e-07
        Akaike info crit   = -403.811763
        Bayesian info crit = -398.628416
        R-squared          = 0.99997896
    [[Variables]]
        amplitude: -0.99669011 +/- 0.01320863 (1.33%) (init = -1)
        center:     25.9928183 +/- 0.02578216 (0.10%) (init = 25)
        sigma:      0.99867632 +/- 0.00629747 (0.63%) (init = 2.5)
        c:          1.00015992 +/- 1.1486e-04 (0.01%) (init = 1)
    [[Correlations]] (unreported correlations are < 0.100)
        C(amplitude, center) = -0.997
        C(center, sigma)     = 0.954
        C(amplitude, sigma)  = -0.943
        C(sigma, c)          = 0.287
        C(amplitude, c)      = -0.223
    

    and make a plot like this: plot of data, best-fit, and extrapolation with logistic step function