Let's say I have a set of data (x=times,y=observation)
that have multiple gaps in time. Whatever is the data trend, let's assume it linear for this discussion. During the gaps in time, there is a decay that makes data deviate from the purely linear trend, until observations start again and the linear trend is recovered. There are multiple gaps in time, but in this example I have reported only the shortest snap to illustrate the problem. The gaps in time are times between the (positive) linear trends where there are no observations available, therefore the difference between consecutive x=times
is (much) larger than, let's say, the average. I want to model the decay as part of the function (y_decay = C -D*x
)
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def f(x, A, B, C, D):
line = A*x + B if ((x>=1) & (x<=3) | (x>=5) & (x<=9) | (x>=23) & (x<=25)) else C-D*x
return line
x=[1,2,3, 12,13,14, 23,24,25]
y=[2,4,6, 5, 7, 9, 8, 10,12]
popt, pcov = curve_fit(f, x, y)
figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
ax1 = plt.gca()
plot.scatter(x,y)
plt.show()
How do I model the decay
variable as part of the function and obtain its best-fit value?
With full periodicity I'd do something like this:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import leastsq
def data_func( x, x0, a, bc, off, p1, p2):
"""
fit function that uses modulus to get periodicity
two linear functions are combines piecewise by boxing them
using the heaviside function with the periodic input
over all slope is added.
Note that the "decay" part maybe positive with this solution.
"""
P1 = abs(p1)
P2 = abs(p2)
X = x - x0
P= P1 + P2
mod = X % P
y0 = a * P1
beta = y0 * P / P2
slope = y0 / P2
box1 = np.heaviside( +np.abs( ( X - P1 / 2. ) % P - 0.5 * P ) - 0.5 * P2, .5 )
box2 = np.heaviside( -np.abs( ( X - P1 / 2. ) % P - 0.5 * P ) + 0.5 * P2, .5 )
out = a * mod * box1
out += (beta - slope * mod )* box2
out += off + bc * X
return out
def residuals( params, xl ,yl ):
x0, a, bc, off, p1, p2 = params
diff = np.fromiter( ( y - data_func( x, x0, a, bc, off, p1, p2 ) for x, y in zip( xl, yl ) ), np.float )
return diff
theOff = 0.7
theP1= 1.8869
theP2 = 5.21163
theP = theP1 + theP2
xdata = np.linspace(-1, 26, 51 )
xdata = np.fromiter( ( x for x in xdata if (x-theOff)%theP <= theP1 ),np.float )
ydata = np.fromiter( ( data_func( x, theOff, .6, .1, 17, theP1, theP2) for x in xdata ),np.float )
tl = np.linspace(-1, 26, 150 )
yl = np.fromiter( ( data_func( x, theOff, .6, .1, 17, theP1, theP2) for x in tl ),np.float )
guess= [0, 0.55, .1, 16 , 2, 5 ]
sol, err = leastsq( residuals, guess, args = ( xdata, ydata ) )
print sol
### getting the real slopes out of the data
s1 = sol[1]+ sol[2]
s2 = - sol[1] * sol[4] / sol[5] + sol[2]
print "real slope1 = {}".format( s1 )
print "real slope2 = {}".format( s2 )
fit = np.fromiter( ( data_func( x, *sol ) for x in tl ),np.float )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
### original data
ax.plot( tl, yl, ls='--')
ax.plot( xdata, ydata, ls='', marker='+')
### fit
ax.plot( tl, fit )
### check the slopes
short = np.linspace(0, 3, 3)
ax.plot( short, [ 17 + s1 * s for s in short ] )
short = np.linspace(3, 10, 3)
ax.plot( short, [ 18 + s2 * s for s in short ] )
ax.grid()
plt.show()
which gives:
>> [ 0.39352332 0.59149625 0.10850375 16.78546632 1.85009228 5.35049099]
>> real slope1 = 0.7
>> real slope2 = -0.0960237685357
and
Naturally, the lack of information in the gaps results in a rather bad fit of the slope there. As a consequence there is an according error in the periodicity. If that would be known, the accuracy would increase of course.
You need a reasonable good guess for the starting parameters!