I would like to resample a dataframe hourly and keep the category variable, how can I do that efficiently. I normally use df = df.resample('h').sum()
but this does not work with my category variable. Any idea?
date = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00']
value = [33.24 , 31.71 , 34.39 , 34.49 ]
value2 = [2*x for x in value]
value3 = [3*x for x in value]
cat = ['a','a','b','b']
df = pd.DataFrame({'value':value,'value2':value2,'value3':value3,'index':date,'category':cat})
df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')
df.drop(['index'],axis=1,inplace=True)
print(df.head())
value value2 value3 category
index
2015-02-03 23:00:00 33.24 66.48 99.72 a
2015-02-03 23:30:00 31.71 63.42 95.13 a
2015-02-04 00:00:00 34.39 68.78 103.17 b
2015-02-04 00:30:00 34.49 68.98 103.47 b
expected result:
value value2 value3 category
index
2015-02-03 23:00:00 64.95 129.9 194.85 a
2015-02-04 00:00:00 68.88 137.76 206.64 b
Use DataFrameGroupBy.resample
- it means chain groupby
with resample
:
df = df.groupby('category').resample('h').sum()
print (df)
value value2 value3
category index
a 2015-02-03 23:00:00 64.95 129.90 194.85
b 2015-02-04 00:00:00 68.88 137.76 206.64
Or is possible use Grouper
:
df = df.groupby(['category', pd.Grouper(freq='h')]).sum()
print (df)
value value2 value3
category index
a 2015-02-03 23:00:00 64.95 129.90 194.85
b 2015-02-04 00:00:00 68.88 137.76 206.64