Search code examples
pythonpandastime-seriesrolling-computationpandas-resample

aggregate and distribute time series data


I have some time series data in a pandas data frame like this:

begin end mw_values
2021-09-14 11:16:00 2021-09-14 11:27:11 0
2021-09-14 11:27:11 2021-09-14 11:30:00 100
2021-09-14 11:30:00 2021-09-14 11:33:59 1200
2021-09-14 11:33:59 2021-09-14 11:39:42 600
2021-09-14 11:39:42 2021-09-14 11:59:59 400

I need the sum of the mw_values distributed into 15 minutes time slots like this:

time_slots_15_min sum_mw_values
2021-09-14 11:00 0
2021-09-14 11:15 100
2021-09-14 11:30 2200
2021-09-14 11:45 0
2021-09-14 12:00 0

Does someone have any idea how I can achieve this?

Note that the intervals between begin and end may overlap 2 time slots. Then the value must be involved in the sum of the time slot where it begins; e.g. the mw_value of 400 in the example from above.


Solution

  • You can resample your dataframe so sum the data in 15 minute bins. Then you can reindex that frame so it matches your desired start/end/frequency times.

    freq = "15min"
    new_index = pd.date_range(
        "2021-09-14 11:00:00", "2021-09-14 12:00:00", freq=freq
    )
    
    out = (
        df.resample(freq, on="begin")["mw_values"]
        .sum()
        .reindex(new_index, fill_value=0)
        .to_frame("sum_mw_values")
    )
    
    print(out)
                         sum_mw_values
    2021-09-14 11:00:00              0
    2021-09-14 11:15:00            100
    2021-09-14 11:30:00           2200
    2021-09-14 11:45:00              0
    2021-09-14 12:00:00              0