I am trying to implement lagrange interpolation on a time series. My input is in below format which contains two columns datetime and stock value
'3/8/2012 16:00:00 32.21'
'3/9/2012 16:00:00 32.16'
'3/12/2012 16:00:00 32.2'
'3/13/2012 16:00:00 Missing_1'
'3/14/2012 16:00:00 32.88'
'3/15/2012 16:00:00 32.94'
'3/16/2012 16:00:00 32.95'
'3/19/2012 16:00:00 32.61'
'3/20/2012 16:00:00 32.15'
'3/21/2012 16:00:00 Missing_2'
'3/22/2012 16:00:00 32.09'
'3/23/2012 16:00:00 32.11'
'3/26/2012 16:00:00 Missing_3'
In some of the input cases the stock value is missing, these missing values i am trying to predict using scipy.interpolate
def is_number(s):
try:
float(s)
return True
except ValueError:
return False
for k in a: # a is input list
x,y = k.split("\t")
if is_number(y):
x = datetime.datetime.strptime(x,"%m/%d/%Y %H:%M:%S")
x = time.mktime(x.timetuple())
y = float(y)
x_axis.append(x)
y_axis.append(y)
else:
x = datetime.datetime.strptime(x,"%m/%d/%Y %H:%M:%S")
x = time.mktime(x.timetuple())
unknown_x.append(x)
x = np.array(x_axis)
y = np.array(y_axis)
unknown = np.array(unknown_x)
y_interp=scipy.interpolate.lagrange(x, y)
for k in unknown:
print y_interp(k)
But i am getting runtime warning,
/var/ml/python/local/lib/python2.7/site-packages/numpy/lib/polynomial.py:728: RuntimeWarning: invalid value encountered in add
val = NX.concatenate((zr, a1)) + a2
/var/ml/python/local/lib/python2.7/site-packages/numpy/lib/polynomial.py:725: RuntimeWarning: invalid value encountered in add
val = a1 + a2
Your x
s from your date conversion result in large values. Inputting those in a (Lagrange) polynomial and then trying to do an interpolation likely results in numerical instable calculations (since, for large x
to obtain a relatively small y
, you'll need small coefficients).
In addition, the documentation for scipy.interpolate.lagrange warns that the implementation is numerically unstable.
Always normalize your data to some reasonable numbers. You could subtract a certain date; the default uses 1970, unix zero time, which is obviously a bad choice. With your example dates, pick e.g. first of March 2012), or divide by a value somewhere in the middle (1332000000 could be a good value) to get all your values around 1.