Search code examples
pythonpython-2.7numpyinterpolationdata-analysis

runtimewarning while using lagrange interpolation in numpy


I am trying to implement lagrange interpolation on a time series. My input is in below format which contains two columns datetime and stock value

'3/8/2012 16:00:00  32.21'
'3/9/2012 16:00:00  32.16'
'3/12/2012 16:00:00 32.2'
'3/13/2012 16:00:00 Missing_1'
'3/14/2012 16:00:00 32.88'
'3/15/2012 16:00:00 32.94'
'3/16/2012 16:00:00 32.95'
'3/19/2012 16:00:00 32.61'
'3/20/2012 16:00:00 32.15'
'3/21/2012 16:00:00 Missing_2'
'3/22/2012 16:00:00 32.09'
'3/23/2012 16:00:00 32.11'
'3/26/2012 16:00:00 Missing_3'

In some of the input cases the stock value is missing, these missing values i am trying to predict using scipy.interpolate

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

for k in a: # a is input list
    x,y = k.split("\t")
    if is_number(y):
        x = datetime.datetime.strptime(x,"%m/%d/%Y %H:%M:%S")
        x = time.mktime(x.timetuple())
        y = float(y)
        x_axis.append(x)
        y_axis.append(y)
    else:
        x = datetime.datetime.strptime(x,"%m/%d/%Y %H:%M:%S")
        x = time.mktime(x.timetuple())
        unknown_x.append(x)


x = np.array(x_axis)
y = np.array(y_axis)
unknown = np.array(unknown_x)
y_interp=scipy.interpolate.lagrange(x, y)
for k in unknown:
    print y_interp(k)

But i am getting runtime warning,

/var/ml/python/local/lib/python2.7/site-packages/numpy/lib/polynomial.py:728: RuntimeWarning: invalid value encountered in add
  val = NX.concatenate((zr, a1)) + a2
/var/ml/python/local/lib/python2.7/site-packages/numpy/lib/polynomial.py:725: RuntimeWarning: invalid value encountered in add
  val = a1 + a2

Solution

  • Your xs from your date conversion result in large values. Inputting those in a (Lagrange) polynomial and then trying to do an interpolation likely results in numerical instable calculations (since, for large x to obtain a relatively small y, you'll need small coefficients).

    In addition, the documentation for scipy.interpolate.lagrange warns that the implementation is numerically unstable.

    Always normalize your data to some reasonable numbers. You could subtract a certain date; the default uses 1970, unix zero time, which is obviously a bad choice. With your example dates, pick e.g. first of March 2012), or divide by a value somewhere in the middle (1332000000 could be a good value) to get all your values around 1.