Pandas how to resample between specific dates and sum values within date ranges?

(This questions is a follow up of How to use pandas Grouper with 7d frequency and fill missing days with 0?)

I have the following dataframe

df = pd.DataFrame({
    'names': ['joe', 'joe', 'joe'],
    'dates': [dt.datetime(2019,6,1), dt.datetime(2019,6,5), dt.datetime(2019,7,1)],
    'start_date': dt.datetime(2019,5,1),
    'end_date': dt.datetime(2019,7,5),
    'values': [5,2,13]
})

  names      dates start_date   end_date  values
0   joe 2019-06-01 2019-05-01 2019-07-05       5
1   joe 2019-06-05 2019-05-01 2019-07-05       2
2   joe 2019-07-01 2019-05-01 2019-07-05      13

and I want to resample the data in 7 days blocks, with date ranges defined by start_date and end_date.

I was able to achieve the result I'm looking for by using the following workaround

temp_df = pd.DataFrame({
    'dates': [df.start_date.tolist()[0], dt.end_date.tolist()[0]],
    'names': df.names.tolist()[0],
    'values': 0
})

concat_df = pd.concat([df, temp_df], axis=0, sort=True)

concat_df.set_index('dates').groupby('names').resample('7D').sum()

which outputs

                  values
names dates             
joe   2019-05-01       0
      2019-05-08       0
      2019-05-15       0
      2019-05-22       0
      2019-05-29       5
      2019-06-05       2
      2019-06-12       0
      2019-06-19       0
      2019-06-26      13
      2019-07-03       0

which is what I'm looking for.

I'm pretty sure there are better ways to achieve that. Do you have any suggestion?

Thanks!

Solution

You just need to reindex you current time series, avoiding the pd.concat can yield better results.

df.set_index('dates', inplace=True)
def groupbyResample(groupby):
    date_range = pd.date_range(groupby.start_date.min(), groupby.end_date.max(), 
                               freq='d', name='date_index')

    return date_range.to_frame(name='dates').join(groupby).resample('7D')['values'].sum()

df.groupby('names').apply(groupbyResample).stack()

Result:

names  date_index
joe    2019-05-01     0.0
       2019-05-08     0.0
       2019-05-15     0.0
       2019-05-22     0.0
       2019-05-29     5.0
       2019-06-05     2.0
       2019-06-12     0.0
       2019-06-19     0.0
       2019-06-26    13.0
       2019-07-03     0.0
dtype: float64

Using magic function %%timeit:

%%timeit
def groupbyResample(groupby):
..
7.9 ms ± 352 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Your Approach:

%%timeit
temp_df = pd.DataFrame({ ..
9.1 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

There isn't much performance-enhancement, however, using a function does scale well and you won't have write a lot of code yourself. That being said, I still feel that there is a more idiomatic way to do it.