Search code examples
pythonscipycurve-fittingsigmoidgoodness-of-fit

Sigmoid curve detection


I have tabular data-set representing curves, each curve is represented by 42 values(data points), the goal is to filter out curves that do not follow Sigmoid function.

Technique applied

  1. Sigmoid Curve Fitting
  2. Calculate goodness of curve

enter image description here

Curve fitting source

from scipy.optimize import curve_fit
def sigmoid(x, L=max(y), x0=21, k=0.6, b=5):
    y = L / (1 + np.exp(-k*(x-x0)))+b
    return (y)

p0 = [max(y), np.median(x),1,min(y)] 

popt, pcov = curve_fit(sigmoid, x, y, p0, method='dogbox',  maxfev=10000)

Plotting

yd = sigmoid(x, *popt)
plt.plot(x, y, 'o', label='data')
plt.plot(x,yd, label='fit')
plt.legend(loc='best')
plt.show()

r2_score(y, yd) = 0.99

enter image description here

but even when curve is not sigmoid, cuve fit very well andI get fitness of curve r2_score(y, yd) = 0.98

enter image description here

enter image description here

Example data

    **Sigmoid**
        [154.02811505496447,
         146.39766673379745,
         130.55841841263054,
         105.90461009146338,
         66.8461297702961,
         22.543803049129565,
         -13.688227352037302,
         -31.754967769204086,
         -36.574590925571556,
         -34.31173263297842,
         -27.98295459843348,
         -17.624496325705877,
         -2.2469180569519267,
         20.740420258644008,
         54.053534582814336,
         104.15375611806758,
         180.67655429725164,
         299.0412892474392,
         473.8589268806131,
         712.1355324045853,
         1010.3945120433141,
         1353.3417600831544,
         1722.423136626168,
         2095.8689925500385,
         2453.614570050715,
         2779.492987742925,
         3064.6579177888016,
         3304.9067183437182,
         3500.629595471177,
         3654.4640620149517,
         3773.8156617564973,
         3866.2930060208614,
         3937.098925829344,
         3990.995709651212,
         4032.976381384583,
         4066.19200350293,
         4094.2713932805746,
         4117.570526667072,
         4137.0863623072,
         4154.089487119825,
         4169.671081872018,
         4185.233572233441]
     Non sigmoid
[489.2834973631293,
 361.00794898560935,
 263.98040060808944,
 176.09045223057,
 110.87762385304995,
 63.42773947552996,
 42.065867898009856,
 29.47418768048965,
 23.254148294970037,
 17.262475347849886,
 13.390803854810201,
 5.18880594026632,
 -4.0552569677629435,
 -9.77379815878885,
 -15.39564800511198,
 -17.0930552390937,
 -22.386235681666676,
 -24.01368224348971,
 -27.6271366708811,
 -28.704645895235444,
 -26.672167652096505,
 -20.310502874851863,
 -17.661003297287152,
 -15.088099452837014,
 -15.872947794945503,
 -8.34466572098927,
 -1.6253080011324528,
 6.594890931118698,
 10.953473235028014,
 14.039900455748466,
 17.299573334162687,
 16.739464327477435,
 16.650048075311133,
 13.090813997028818,
 12.731754904427362,
 12.118767243738603,
 12.095028866568555,
 11.33835463248488,
 5.952943083721948,
 -0.7048030993591965,
 -9.088792078874576,
 -15.823553268803153]
​

Related work

Link1

Link2

Link3

Link4


Solution

  • The problem is that you are using unbounded parameters. For example, if you allow L to be negative, you can fit a monotonically decreasing dataset with your function.

    If I add simple non-negativity bounds to your fit, I get:

    def sigmoid(x, L=max(y), x0=21, k=0.6, b=5):
        y = L / (1 + np.exp(-k*(x-x0)))+b
        return (y)
    
    p0 = [max(y), np.median(x), 1, 0] 
    
    popt, pcov = curve_fit(sigmoid, x, y, p0, method='dogbox',  maxfev=10000, bounds=(0, np.inf))
    

    Sigmoid: enter image description here

    Non sigmoid: enter image description here

    You can play with the bounds to better restrict the fitting to your allowable range of shapes.