(This questions is a follow up of How to use pandas Grouper with 7d frequency and fill missing days with 0?)
I have the following dataframe
df = pd.DataFrame({
'names': ['joe', 'joe', 'joe'],
'dates': [dt.datetime(2019,6,1), dt.datetime(2019,6,5), dt.datetime(2019,7,1)],
'start_date': dt.datetime(2019,5,1),
'end_date': dt.datetime(2019,7,5),
'values': [5,2,13]
})
names dates start_date end_date values
0 joe 2019-06-01 2019-05-01 2019-07-05 5
1 joe 2019-06-05 2019-05-01 2019-07-05 2
2 joe 2019-07-01 2019-05-01 2019-07-05 13
and I want to resample the data in 7 days blocks, with date ranges defined by start_date
and end_date
.
I was able to achieve the result I'm looking for by using the following workaround
temp_df = pd.DataFrame({
'dates': [df.start_date.tolist()[0], dt.end_date.tolist()[0]],
'names': df.names.tolist()[0],
'values': 0
})
concat_df = pd.concat([df, temp_df], axis=0, sort=True)
concat_df.set_index('dates').groupby('names').resample('7D').sum()
which outputs
values
names dates
joe 2019-05-01 0
2019-05-08 0
2019-05-15 0
2019-05-22 0
2019-05-29 5
2019-06-05 2
2019-06-12 0
2019-06-19 0
2019-06-26 13
2019-07-03 0
which is what I'm looking for.
I'm pretty sure there are better ways to achieve that. Do you have any suggestion?
Thanks!
You just need to reindex you current time series, avoiding the pd.concat
can yield better results.
df.set_index('dates', inplace=True)
def groupbyResample(groupby):
date_range = pd.date_range(groupby.start_date.min(), groupby.end_date.max(),
freq='d', name='date_index')
return date_range.to_frame(name='dates').join(groupby).resample('7D')['values'].sum()
df.groupby('names').apply(groupbyResample).stack()
Result:
names date_index
joe 2019-05-01 0.0
2019-05-08 0.0
2019-05-15 0.0
2019-05-22 0.0
2019-05-29 5.0
2019-06-05 2.0
2019-06-12 0.0
2019-06-19 0.0
2019-06-26 13.0
2019-07-03 0.0
dtype: float64
Using magic function %%timeit
:
%%timeit
def groupbyResample(groupby):
..
7.9 ms ± 352 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Your Approach:
%%timeit
temp_df = pd.DataFrame({ ..
9.1 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
There isn't much performance-enhancement, however, using a function does scale well and you won't have write a lot of code yourself. That being said, I still feel that there is a more idiomatic way to do it.