Search code examples
pythonpandascsvcandlestick-chart

How to resample 1 minute data into 15 minute data?


CSV file. df before resample and after applying:

df["dateandtime"] = (pd.to_datetime(df.pop("DATE").str.cat(df.pop("TIME"), sep=" ")))
df = df.set_index(pd.DatetimeIndex(df.pop("dateandtime")))
dateandtime Open High Low Close VOLUME
2020-03-11 00:00:00-04:00 2812.75 2813.25 2811.25 2811.25 296
2020-03-11 00:01:00-04:00 2811.25 2811.5 2809.25 2809.5 359
2020-03-11 00:02:00-04:00 2809.25 2810 2808.25 2809.5 189
2020-03-11 00:03:00-04:00 2809.5 2809.5 2806.5 2806.75 602
2020-03-11 00:04:00-04:00 2806.5 2809.75 2806.5 2809 299

How do I resample this 1 minute candlestick data into 15 minute data? I tried:

from dateutil.tz import gettz
import pandas as pd
import finplot as fplt


df = pd.read_csv('/home/user/Documents/finance/fin-smart/lab/ES.csv')
df.rename(columns={'OPEN': 'Open'}, inplace=True)
df.rename(columns={'HIGH': 'High'}, inplace=True)
df.rename(columns={'LOW': 'Low'}, inplace=True)
df.rename(columns={'CLOSE': 'Close'}, inplace=True)
df["dateandtime"] = (pd.to_datetime(df.pop("DATE").str.cat(df.pop("TIME"), sep=" ")))
df = df.set_index(pd.DatetimeIndex(df.pop("dateandtime")))

# first day is 2010.01.04
# last day is 2020.03.13
start_date = '2020.03.12'
end_date = '2020.03.13'
df = df.loc[start_date:end_date]

df.resample("15T").agg(
    Open=("Open", "first"),
    High=("High", "max"),
    Low=("Low", "min"),
    Close=("Close", "last"),
)

fplt.display_timezone = gettz('America/Chicago')
fplt.candlestick_ochl(df[['Open', 'Close', 'High', 'Low']])
fplt.show()

It gives the same output as without df.resample() and no error or warning. Is df in an incorrect format?


Solution

  • The following code works for me (given your input file).

    Not sure what exactly is wrong with your code, but it seems to me that you are doing some unnecessary stuff, and can simplify as follows:

    df = pd.read_csv('ES.csv')
    
    df.index = pd.DatetimeIndex(df['DATE']+' '+df['TIME'])
    df.drop(['DATE','TIME'],axis=1,inplace=True)
    
    df.rename(columns={'OPEN' : 'Open',
                       'HIGH' : 'High',
                       'LOW'  : 'Low',
                       'CLOSE': 'Close',
                       'VOLUME':'Volume'},inplace=True)
    
    print(df.shape)
    print(df.head())
    print()
    print(df.tail())
    

    output from above prints:

    (3535212, 5)
                            Open     High      Low    Close  Volume
    2010-01-04 00:01:00  1113.75  1114.75  1113.25  1114.25    2185
    2010-01-04 00:02:00  1114.25  1115.25  1114.25  1114.75     437
    2010-01-04 00:03:00  1114.75  1114.75  1114.25  1114.50     596
    2010-01-04 00:04:00  1114.25  1114.75  1114.25  1114.50     131
    2010-01-04 00:05:00  1114.50  1114.50  1114.25  1114.25     166
    
                            Open     High     Low    Close  Volume
    2020-04-07 10:46:00  2716.25  2716.50  2713.0  2714.75     504
    2020-04-07 10:47:00  2714.75  2715.75  2713.0  2713.25     650
    2020-04-07 10:48:00  2713.50  2713.50  2711.0  2711.25     611
    2020-04-07 10:49:00  2711.00  2713.25  2709.5  2712.25     777
    2020-04-07 10:50:00  2712.50  2714.50  2712.5  2714.00     297
    

    now resample:

    tdf = df.loc['2020.03.12':'2020.03.13']
    
    ndf = tdf.resample('15T').agg(
        {'Open'  :'first',
         'High'  :'max',
         'Low'   :'min',
         'Close' :'last',
         'Volume':'sum'
        })
    
    print(ndf.shape)
    print(ndf.head())
    print()
    print(ndf.tail())
    
    

    output from the above prints of the resampled dataframe:

    (185, 5)
                            Open     High      Low    Close  Volume
    2020-03-12 00:00:00  2755.00  2755.50  2739.25  2739.50     952
    2020-03-12 00:15:00  2740.00  2742.00  2731.75  2735.75     730
    2020-03-12 00:30:00  2735.75  2739.00  2727.50  2730.75     464
    2020-03-12 00:45:00  2732.00  2736.75  2715.25  2723.75     935
    2020-03-12 01:00:00  2722.75  2731.00  2716.75  2717.00    1166
    
                            Open     High      Low    Close  Volume
    2020-03-13 21:00:00  2670.00  2697.25  2666.25  2687.50  267212
    2020-03-13 21:15:00  2687.75  2688.75  2679.50  2685.25   13637
    2020-03-13 21:30:00  2678.25  2678.75  2643.00  2665.75   19611
    2020-03-13 21:45:00  2665.50  2668.25  2651.75  2655.75   12474
    2020-03-13 22:00:00  2656.25  2658.00  2650.00  2652.75    2152