Fit model function out defined data range

Let's say I have a set of data (x=times,y=observation) that have multiple gaps in time. Whatever is the data trend, let's assume it linear for this discussion. During the gaps in time, there is a decay that makes data deviate from the purely linear trend, until observations start again and the linear trend is recovered. There are multiple gaps in time, but in this example I have reported only the shortest snap to illustrate the problem. The gaps in time are times between the (positive) linear trends where there are no observations available, therefore the difference between consecutive x=times is (much) larger than, let's say, the average. I want to model the decay as part of the function (y_decay = C -D*x)

from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def f(x, A, B, C, D):
    line = A*x + B if ((x>=1) & (x<=3) | (x>=5) & (x<=9) | (x>=23) & (x<=25)) else C-D*x
    return line

x=[1,2,3, 12,13,14, 23,24,25]
y=[2,4,6, 5, 7, 9, 8, 10,12]
popt, pcov = curve_fit(f, x, y) 

figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
ax1 = plt.gca()

plot.scatter(x,y)
plt.show()

How do I model the decay variable as part of the function and obtain its best-fit value?

Solution

With full periodicity I'd do something like this:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import leastsq

def data_func( x, x0, a, bc, off, p1, p2):
    """
    fit function that uses modulus to get periodicity
    two linear functions are combines piecewise by boxing them
    using the heaviside function with the periodic input
    over all slope is added.
    Note that the "decay" part maybe positive with this solution.
    """
    P1 = abs(p1)
    P2 = abs(p2)
    X = x - x0
    P= P1 + P2
    mod = X % P
    y0 = a * P1
    beta = y0 * P / P2
    slope = y0 / P2
    box1 =  np.heaviside( +np.abs( ( X - P1 / 2. ) % P - 0.5 * P ) - 0.5 * P2, .5 )
    box2 =  np.heaviside( -np.abs( ( X - P1 / 2. ) % P - 0.5 * P ) + 0.5 * P2, .5 )
    out = a * mod * box1 
    out += (beta - slope * mod  )* box2
    out += off + bc * X
    return out


def residuals( params, xl ,yl ):
    x0, a, bc, off, p1, p2 = params
    diff = np.fromiter( ( y - data_func( x, x0, a, bc, off, p1, p2 ) for x, y in zip( xl, yl )  ), np.float )
    return diff



theOff = 0.7
theP1= 1.8869
theP2 = 5.21163
theP = theP1 + theP2
xdata = np.linspace(-1, 26, 51 )
xdata = np.fromiter( ( x for x in xdata if (x-theOff)%theP <= theP1 ),np.float )
ydata = np.fromiter( ( data_func( x, theOff, .6, .1, 17, theP1, theP2) for x in xdata ),np.float )

tl = np.linspace(-1, 26, 150 )
yl = np.fromiter( ( data_func( x, theOff, .6, .1, 17, theP1, theP2) for x in tl ),np.float )

guess= [0, 0.55, .1, 16 , 2, 5 ]
sol, err = leastsq( residuals, guess, args = ( xdata, ydata ) )
print sol
### getting the real slopes out of the data
s1 = sol[1]+ sol[2] 
s2 =  - sol[1] * sol[4] / sol[5] + sol[2]
print "real slope1 = {}".format( s1 )
print "real slope2 = {}".format( s2 )

fit = np.fromiter( ( data_func( x, *sol ) for x in tl ),np.float )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )

### original data
ax.plot( tl, yl, ls='--')
ax.plot( xdata, ydata, ls='', marker='+')
### fit
ax.plot( tl, fit )

### check the slopes
short = np.linspace(0, 3, 3)
ax.plot( short, [ 17 + s1 * s for s in short ] )
short = np.linspace(3, 10, 3)
ax.plot( short, [ 18 + s2 * s for s in short ] )

ax.grid()
plt.show()

which gives:

>> [ 0.39352332  0.59149625  0.10850375 16.78546632  1.85009228  5.35049099]
>> real slope1 = 0.7
>> real slope2 = -0.0960237685357

and

Naturally, the lack of information in the gaps results in a rather bad fit of the slope there. As a consequence there is an according error in the periodicity. If that would be known, the accuracy would increase of course.

You need a reasonable good guess for the starting parameters!