Search code examples
pythonpandastime-serieslinear-interpolation

Python regularise irregular time series with linear interpolation


I have a time series in pandas that looks like this:

                     Values
1992-08-27 07:46:48    28.0  
1992-08-27 08:00:48    28.2  
1992-08-27 08:33:48    28.4  
1992-08-27 08:43:48    28.8  
1992-08-27 08:48:48    29.0  
1992-08-27 08:51:48    29.2  
1992-08-27 08:53:48    29.6  
1992-08-27 08:56:48    29.8  
1992-08-27 09:03:48    30.0

I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:

                     Values
1992-08-27 08:00:00    28.2  
1992-08-27 08:15:00    28.3  
1992-08-27 08:30:00    28.4  
1992-08-27 08:45:00    28.8  
1992-08-27 09:00:00    29.9

However using the resample method (df.resample('15Min')) from Pandas I get:

                     Values
1992-08-27 08:00:00   28.20  
1992-08-27 08:15:00     NaN  
1992-08-27 08:30:00   28.60  
1992-08-27 08:45:00   29.40  
1992-08-27 09:00:00   30.00  

I have tried the resample method with different how and fill_method parameters but never got exactly the results I wanted. Am I using the wrong method?


Solution

  • It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted is used to find dates closest to the resample point.

    # empty frame with desired index
    rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)
    
    # array of indexes corresponding with closest timestamp after resample
    idx_after = np.searchsorted(df.index.values, rs.index.values)
    
    # values and timestamp before/after resample
    rs['after'] = df.loc[df.index[idx_after], 'Values'].values
    rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
    rs['after_time'] = df.index[idx_after]
    rs['before_time'] = df.index[idx_after - 1]
    
    #calculate new weighted value
    rs['span'] = (rs['after_time'] - rs['before_time'])
    rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
    # I got errors here unless I turn the index to a series
    rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']
    
    rs['Values'] = rs.eval('before * before_weight + after * after_weight')
    

    After all that, hopefully the right answer:

    In [161]: rs['Values']
    Out[161]: 
    1992-08-27 08:00:00    28.011429
    1992-08-27 08:15:00    28.313939
    1992-08-27 08:30:00    28.223030
    1992-08-27 08:45:00    28.952000
    1992-08-27 09:00:00    29.908571
    Freq: 15T, Name: Values, dtype: float64