python pandas time-series rolling-computation pandas-resample

aggregate and distribute time series data

I have some time series data in a pandas data frame like this:

begin	end	mw_values
2021-09-14 11:16:00	2021-09-14 11:27:11	0
2021-09-14 11:27:11	2021-09-14 11:30:00	100
2021-09-14 11:30:00	2021-09-14 11:33:59	1200
2021-09-14 11:33:59	2021-09-14 11:39:42	600
2021-09-14 11:39:42	2021-09-14 11:59:59	400

I need the sum of the mw_values distributed into 15 minutes time slots like this:

time_slots_15_min	sum_mw_values
2021-09-14 11:00	0
2021-09-14 11:15	100
2021-09-14 11:30	2200
2021-09-14 11:45	0
2021-09-14 12:00	0

Does someone have any idea how I can achieve this?

Note that the intervals between begin and end may overlap 2 time slots. Then the value must be involved in the sum of the time slot where it begins; e.g. the mw_value of 400 in the example from above.

Solution

You can resample your dataframe so sum the data in 15 minute bins. Then you can reindex that frame so it matches your desired start/end/frequency times.

freq = "15min"
new_index = pd.date_range(
    "2021-09-14 11:00:00", "2021-09-14 12:00:00", freq=freq
)

out = (
    df.resample(freq, on="begin")["mw_values"]
    .sum()
    .reindex(new_index, fill_value=0)
    .to_frame("sum_mw_values")
)

print(out)
                     sum_mw_values
2021-09-14 11:00:00              0
2021-09-14 11:15:00            100
2021-09-14 11:30:00           2200
2021-09-14 11:45:00              0
2021-09-14 12:00:00              0