Search code examples
pythonpandasdataframetime-seriesresampling

Pandas Resampling error: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex


When using pandas' resample function on a DataFrame in order to convert tick data to OHLCV, a resampling error is encountered.

How should we solve the error?

enter image description here

# Resample data into 30min bins
bars = data.Price.resample('30min', how='ohlc')
volumes = data.Volume.resample('30min', how='sum')

This gives the error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

Solution

  • Convert the integer timestamps in the index to a DatetimeIndex:

    data.index = pd.to_datetime(data.index, unit='s')
    

    This interprets the integers as seconds since the Epoch.


    For example, given

    data = pd.DataFrame(
        {'Timestamp':[1313331280, 1313334917, 1313334917, 1313340309, 1313340309], 
         'Price': [10.4]*3 + [10.5]*2, 'Volume': [0.779, 0.101, 0.316, 0.150, 1.8]})
    data = data.set_index(['Timestamp'])
    #             Price  Volume
    # Timestamp                
    # 1313331280   10.4   0.779
    # 1313334917   10.4   0.101
    # 1313334917   10.4   0.316
    # 1313340309   10.5   0.150
    # 1313340309   10.5   1.800
    
    data.index = pd.to_datetime(data.index, unit='s')
    

    yields

                         Price  Volume
    2011-08-14 14:14:40   10.4   0.779
    2011-08-14 15:15:17   10.4   0.101
    2011-08-14 15:15:17   10.4   0.316
    2011-08-14 16:45:09   10.5   0.150
    2011-08-14 16:45:09   10.5   1.800
    

    Then

    ticks = data.ix[:, ['Price', 'Volume']]
    bars = ticks.Price.resample('30min').ohlc()
    volumes = ticks.Volume.resample('30min').sum()
    

    can be computed:

    In [368]: bars
    Out[368]: 
                         open  high   low  close
    2011-08-14 14:00:00  10.4  10.4  10.4   10.4
    2011-08-14 14:30:00   NaN   NaN   NaN    NaN
    2011-08-14 15:00:00  10.4  10.4  10.4   10.4
    2011-08-14 15:30:00   NaN   NaN   NaN    NaN
    2011-08-14 16:00:00   NaN   NaN   NaN    NaN
    2011-08-14 16:30:00  10.5  10.5  10.5   10.5
    
    In [369]: volumes
    Out[369]: 
    2011-08-14 14:00:00    0.779
    2011-08-14 14:30:00      NaN
    2011-08-14 15:00:00    0.417
    2011-08-14 15:30:00      NaN
    2011-08-14 16:00:00      NaN
    2011-08-14 16:30:00    1.950
    Freq: 30T, Name: Volume, dtype: float64