CSV file. df
before resample and after applying:
df["dateandtime"] = (pd.to_datetime(df.pop("DATE").str.cat(df.pop("TIME"), sep=" ")))
df = df.set_index(pd.DatetimeIndex(df.pop("dateandtime")))
dateandtime | Open | High | Low | Close | VOLUME |
---|---|---|---|---|---|
2020-03-11 00:00:00-04:00 | 2812.75 | 2813.25 | 2811.25 | 2811.25 | 296 |
2020-03-11 00:01:00-04:00 | 2811.25 | 2811.5 | 2809.25 | 2809.5 | 359 |
2020-03-11 00:02:00-04:00 | 2809.25 | 2810 | 2808.25 | 2809.5 | 189 |
2020-03-11 00:03:00-04:00 | 2809.5 | 2809.5 | 2806.5 | 2806.75 | 602 |
2020-03-11 00:04:00-04:00 | 2806.5 | 2809.75 | 2806.5 | 2809 | 299 |
How do I resample this 1 minute candlestick data into 15 minute data? I tried:
from dateutil.tz import gettz
import pandas as pd
import finplot as fplt
df = pd.read_csv('/home/user/Documents/finance/fin-smart/lab/ES.csv')
df.rename(columns={'OPEN': 'Open'}, inplace=True)
df.rename(columns={'HIGH': 'High'}, inplace=True)
df.rename(columns={'LOW': 'Low'}, inplace=True)
df.rename(columns={'CLOSE': 'Close'}, inplace=True)
df["dateandtime"] = (pd.to_datetime(df.pop("DATE").str.cat(df.pop("TIME"), sep=" ")))
df = df.set_index(pd.DatetimeIndex(df.pop("dateandtime")))
# first day is 2010.01.04
# last day is 2020.03.13
start_date = '2020.03.12'
end_date = '2020.03.13'
df = df.loc[start_date:end_date]
df.resample("15T").agg(
Open=("Open", "first"),
High=("High", "max"),
Low=("Low", "min"),
Close=("Close", "last"),
)
fplt.display_timezone = gettz('America/Chicago')
fplt.candlestick_ochl(df[['Open', 'Close', 'High', 'Low']])
fplt.show()
It gives the same output as without df.resample()
and no error or warning. Is df
in an incorrect format?
The following code works for me (given your input file).
Not sure what exactly is wrong with your code, but it seems to me that you are doing some unnecessary stuff, and can simplify as follows:
df = pd.read_csv('ES.csv')
df.index = pd.DatetimeIndex(df['DATE']+' '+df['TIME'])
df.drop(['DATE','TIME'],axis=1,inplace=True)
df.rename(columns={'OPEN' : 'Open',
'HIGH' : 'High',
'LOW' : 'Low',
'CLOSE': 'Close',
'VOLUME':'Volume'},inplace=True)
print(df.shape)
print(df.head())
print()
print(df.tail())
output from above prints:
(3535212, 5)
Open High Low Close Volume
2010-01-04 00:01:00 1113.75 1114.75 1113.25 1114.25 2185
2010-01-04 00:02:00 1114.25 1115.25 1114.25 1114.75 437
2010-01-04 00:03:00 1114.75 1114.75 1114.25 1114.50 596
2010-01-04 00:04:00 1114.25 1114.75 1114.25 1114.50 131
2010-01-04 00:05:00 1114.50 1114.50 1114.25 1114.25 166
Open High Low Close Volume
2020-04-07 10:46:00 2716.25 2716.50 2713.0 2714.75 504
2020-04-07 10:47:00 2714.75 2715.75 2713.0 2713.25 650
2020-04-07 10:48:00 2713.50 2713.50 2711.0 2711.25 611
2020-04-07 10:49:00 2711.00 2713.25 2709.5 2712.25 777
2020-04-07 10:50:00 2712.50 2714.50 2712.5 2714.00 297
now resample:
tdf = df.loc['2020.03.12':'2020.03.13']
ndf = tdf.resample('15T').agg(
{'Open' :'first',
'High' :'max',
'Low' :'min',
'Close' :'last',
'Volume':'sum'
})
print(ndf.shape)
print(ndf.head())
print()
print(ndf.tail())
output from the above prints of the resampled dataframe:
(185, 5)
Open High Low Close Volume
2020-03-12 00:00:00 2755.00 2755.50 2739.25 2739.50 952
2020-03-12 00:15:00 2740.00 2742.00 2731.75 2735.75 730
2020-03-12 00:30:00 2735.75 2739.00 2727.50 2730.75 464
2020-03-12 00:45:00 2732.00 2736.75 2715.25 2723.75 935
2020-03-12 01:00:00 2722.75 2731.00 2716.75 2717.00 1166
Open High Low Close Volume
2020-03-13 21:00:00 2670.00 2697.25 2666.25 2687.50 267212
2020-03-13 21:15:00 2687.75 2688.75 2679.50 2685.25 13637
2020-03-13 21:30:00 2678.25 2678.75 2643.00 2665.75 19611
2020-03-13 21:45:00 2665.50 2668.25 2651.75 2655.75 12474
2020-03-13 22:00:00 2656.25 2658.00 2650.00 2652.75 2152