Within Pandas, I would like to resample my dataframe and take the mean within a 5 hour period and within index level. My dataframe looks like: df
timestamp width length
name
10 2019-08-01 00:00:00 10.1 86.1
10 2019-08-01 00:00:10 10.0 86.2
10 2019-08-01 00:05:40 10.1 86.3
10 2019-08-01 00:05:50 10.0 86.2
8 2019-08-01 00:05:54 12.0 110.0
I would like to keep my 'name' variable as index (preferably not setting timestamp as index), like:
timestamp width length
name
10 2019-08-01 00:00:05 10.05 86.15
10 2019-08-01 00:05:45 10.05 86.25
8 2019-08-01 00:05:54 12.0 110.0
I tried:
df_resample = df.resample('5H', on='timestamp').mean()
But this will not perform within index level. Also it sets the datetime on the index which I try to avoid.
IIUC, you can use groupby
and resample
:
(df.groupby(level=0, sort=False)
.resample('5min', on='timestamp').mean()
.reset_index()
)
This however, does not average your timestamps, since you can't really add Datetime
type in pandas, although there are ways around that.
name timestamp width length
0 10 2019-08-01 00:00:00 10.05 86.15
1 10 2019-08-01 00:05:00 10.05 86.25
2 8 2019-08-01 00:05:00 12.00 110.00
Update If you want mean timestamp, you can temporary convert timestamp to int, taking mean, and convert back:
(df.assign(int_time=lambda x: x['timestamp'].astype('int64') )
.groupby(level=0, sort=False)
.resample('5min', on='timestamp').mean()
.reset_index()
.assign(timestamp=lambda x: x['int_time'].astype('int64').astype('datetime64[ns]'))
.drop('int_time', axis=1)
)
Output:
name timestamp width length
0 10 2019-08-01 00:00:05 10.05 86.15
1 10 2019-08-01 00:05:45 10.05 86.25
2 8 2019-08-01 00:05:54 12.00 110.00