Search code examples
pythonfunctionscipydata-fitting

Fit model function out defined data range


Let's say I have a set of data (x=times,y=observation) that have multiple gaps in time. Whatever is the data trend, let's assume it linear for this discussion. During the gaps in time, there is a decay that makes data deviate from the purely linear trend, until observations start again and the linear trend is recovered. There are multiple gaps in time, but in this example I have reported only the shortest snap to illustrate the problem. The gaps in time are times between the (positive) linear trends where there are no observations available, therefore the difference between consecutive x=times is (much) larger than, let's say, the average. I want to model the decay as part of the function (y_decay = C -D*x)

from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def f(x, A, B, C, D):
    line = A*x + B if ((x>=1) & (x<=3) | (x>=5) & (x<=9) | (x>=23) & (x<=25)) else C-D*x
    return line

x=[1,2,3, 12,13,14, 23,24,25]
y=[2,4,6, 5, 7, 9, 8, 10,12]
popt, pcov = curve_fit(f, x, y) 

figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
ax1 = plt.gca()

plot.scatter(x,y)
plt.show()

enter image description here

How do I model the decay variable as part of the function and obtain its best-fit value?


Solution

  • With full periodicity I'd do something like this:

    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.optimize import leastsq
    
    def data_func( x, x0, a, bc, off, p1, p2):
        """
        fit function that uses modulus to get periodicity
        two linear functions are combines piecewise by boxing them
        using the heaviside function with the periodic input
        over all slope is added.
        Note that the "decay" part maybe positive with this solution.
        """
        P1 = abs(p1)
        P2 = abs(p2)
        X = x - x0
        P= P1 + P2
        mod = X % P
        y0 = a * P1
        beta = y0 * P / P2
        slope = y0 / P2
        box1 =  np.heaviside( +np.abs( ( X - P1 / 2. ) % P - 0.5 * P ) - 0.5 * P2, .5 )
        box2 =  np.heaviside( -np.abs( ( X - P1 / 2. ) % P - 0.5 * P ) + 0.5 * P2, .5 )
        out = a * mod * box1 
        out += (beta - slope * mod  )* box2
        out += off + bc * X
        return out
    
    
    def residuals( params, xl ,yl ):
        x0, a, bc, off, p1, p2 = params
        diff = np.fromiter( ( y - data_func( x, x0, a, bc, off, p1, p2 ) for x, y in zip( xl, yl )  ), np.float )
        return diff
    
    
    
    theOff = 0.7
    theP1= 1.8869
    theP2 = 5.21163
    theP = theP1 + theP2
    xdata = np.linspace(-1, 26, 51 )
    xdata = np.fromiter( ( x for x in xdata if (x-theOff)%theP <= theP1 ),np.float )
    ydata = np.fromiter( ( data_func( x, theOff, .6, .1, 17, theP1, theP2) for x in xdata ),np.float )
    
    tl = np.linspace(-1, 26, 150 )
    yl = np.fromiter( ( data_func( x, theOff, .6, .1, 17, theP1, theP2) for x in tl ),np.float )
    
    guess= [0, 0.55, .1, 16 , 2, 5 ]
    sol, err = leastsq( residuals, guess, args = ( xdata, ydata ) )
    print sol
    ### getting the real slopes out of the data
    s1 = sol[1]+ sol[2] 
    s2 =  - sol[1] * sol[4] / sol[5] + sol[2]
    print "real slope1 = {}".format( s1 )
    print "real slope2 = {}".format( s2 )
    
    fit = np.fromiter( ( data_func( x, *sol ) for x in tl ),np.float )
    fig = plt.figure()
    ax = fig.add_subplot( 1, 1, 1 )
    
    ### original data
    ax.plot( tl, yl, ls='--')
    ax.plot( xdata, ydata, ls='', marker='+')
    ### fit
    ax.plot( tl, fit )
    
    ### check the slopes
    short = np.linspace(0, 3, 3)
    ax.plot( short, [ 17 + s1 * s for s in short ] )
    short = np.linspace(3, 10, 3)
    ax.plot( short, [ 18 + s2 * s for s in short ] )
    
    ax.grid()
    plt.show()
    

    which gives:

    >> [ 0.39352332  0.59149625  0.10850375 16.78546632  1.85009228  5.35049099]
    >> real slope1 = 0.7
    >> real slope2 = -0.0960237685357
    

    and

    periodic linear

    Naturally, the lack of information in the gaps results in a rather bad fit of the slope there. As a consequence there is an according error in the periodicity. If that would be known, the accuracy would increase of course.

    You need a reasonable good guess for the starting parameters!