Search code examples
pandastime-seriesdate-range

Pandas Dataframe - Convert time interval to continuous time series


I have an energy data with a start time and end time interval. I want to distribute the total energy to the continuous time series (dividing total energy to total hours). As I searched the results, I have found staircase and daterange functions. However, with these results I couldn't distribute energy and sum same intervals. Sample df;

df = pd.DataFrame({
    "Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
    "Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
    "TotalEnergy":[10, 5, 2],
})

I want to distribute each row and sum same intervals with 15Min resolution. Note that I need TotalEnergy/TotalTime.

My trials;

Staircase Code:

import pandas as pd
times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
import staircase as sc
stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
result = stepfunction(times, include_index=True)
result = pd.DataFrame(result)

Result 1:

2019-01-01 00:00:00  10
2019-01-01 00:15:00  10
2019-01-01 00:30:00  10
2019-01-01 00:45:00  10
2019-01-01 01:00:00  15
2019-01-01 01:15:00  15
2019-01-01 01:30:00  15
2019-01-01 01:45:00  15
2019-01-01 02:00:00  12
2019-01-01 02:15:00  12
2019-01-01 02:30:00  12
2019-01-01 02:45:00  12
2019-01-01 03:00:00   2
2019-01-01 03:15:00   2
2019-01-01 03:30:00   2
2019-01-01 03:45:00   2
2019-01-01 04:00:00   0

New code according to Riley, the author of "Staircase"

import pandas as pd

df = pd.DataFrame({
    "Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
    "Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
    "TotalEnergy":[10, 5, 2],
})

df["Start"] = pd.to_datetime(df["Start"])
df["Stop"] = pd.to_datetime(df["Stop"]),

import staircase as sc

sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
bins = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
sf.slice(pd.IntervalIndex.from_breaks(times)).mean()

Output:

(2019-01-01, 2019-01-01 00:15:00]             10.0
(2019-01-01 00:15:00, 2019-01-01 00:30:00]    10.0
(2019-01-01 00:30:00, 2019-01-01 00:45:00]    10.0
(2019-01-01 00:45:00, 2019-01-01 01:00:00]    10.0
(2019-01-01 01:00:00, 2019-01-01 01:15:00]    15.0
(2019-01-01 01:15:00, 2019-01-01 01:30:00]    15.0
(2019-01-01 01:30:00, 2019-01-01 01:45:00]    15.0
(2019-01-01 01:45:00, 2019-01-01 02:00:00]    15.0
(2019-01-01 02:00:00, 2019-01-01 02:15:00]    12.0
(2019-01-01 02:15:00, 2019-01-01 02:30:00]    12.0
(2019-01-01 02:30:00, 2019-01-01 02:45:00]    12.0
(2019-01-01 02:45:00, 2019-01-01 03:00:00]    12.0
(2019-01-01 03:00:00, 2019-01-01 03:15:00]     2.0
(2019-01-01 03:15:00, 2019-01-01 03:30:00]     2.0
(2019-01-01 03:30:00, 2019-01-01 03:45:00]     2.0
(2019-01-01 03:45:00, 2019-01-01 04:00:00]     2.0

However, this is not what I want. First energy is 10kWh, time interval is 3 hours, that means 12*15 minutes. So each bin should contain 10kWh/12bins = 0.83kWh. The stepfunction should look like maybe;

stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy"/("Stop"-"Start")

The desired output (calculated by hand):

(2019-01-01, 2019-01-01 00:15:00]             0.83
(2019-01-01 00:15:00, 2019-01-01 00:30:00]    0.83
(2019-01-01 00:30:00, 2019-01-01 00:45:00]    0.83
(2019-01-01 00:45:00, 2019-01-01 01:00:00]    0.83
(2019-01-01 01:00:00, 2019-01-01 01:15:00]    2.08
(2019-01-01 01:15:00, 2019-01-01 01:30:00]    2.08
(2019-01-01 01:30:00, 2019-01-01 01:45:00]    2.08
(2019-01-01 01:45:00, 2019-01-01 02:00:00]    2.08
(2019-01-01 02:00:00, 2019-01-01 02:15:00]    1.08
(2019-01-01 02:15:00, 2019-01-01 02:30:00]    1.08
(2019-01-01 02:30:00, 2019-01-01 02:45:00]    1.08
(2019-01-01 02:45:00, 2019-01-01 03:00:00]    1.08
(2019-01-01 03:00:00, 2019-01-01 03:15:00]    0.25
(2019-01-01 03:15:00, 2019-01-01 03:30:00]    0.25
(2019-01-01 03:30:00, 2019-01-01 03:45:00]    0.25
(2019-01-01 03:45:00, 2019-01-01 04:00:00]    0.25

Solution

  • I'm not 100% clear on what is needed, but if it is to essentially take the average (i.e. "distribute" as you say) the energy within 15 minute intervals then the below approach with staircase can be used.

    setup

    import pandas as pd
    
    df = pd.DataFrame({
        "Start":["2019-08-27 14:52:00", "2019-03-01 10:14:05", "2019-07-31 12:54:10", "2019-06-16 10:55:57", "2019-03-07 21:21:04"],
        "Stop":["2019-08-27 17:58:19", "2019-03-01 13:13:54", "2019-07-31 13:21:45", "2019-06-16 14:35:14", "2019-03-08 07:22:59"],
        "TotalEnergy":[9.86, 9.38, 1.45, 38.77, 26.14],
    })
    
    df["Start"] = pd.to_datetime(df["Start"])
    df["Stop"] = pd.to_datetime(df["Stop"])
    

    df looks like this

                    Start                Stop  TotalEnergy
    0 2019-08-27 14:52:00 2019-08-27 17:58:19         9.86
    1 2019-03-01 10:14:05 2019-03-01 13:13:54         9.38
    2 2019-07-31 12:54:10 2019-07-31 13:21:45         1.45
    3 2019-06-16 10:55:57 2019-06-16 14:35:14        38.77
    4 2019-03-07 21:21:04 2019-03-08 07:22:59        26.14
    

    solution

    create step function

    import staircase as sc
    
    sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
    

    create 15 minute bins (a pandas.IntervalIndex)

    times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
    bins = pd.IntervalIndex.from_breaks(times)
    

    slice the step function into bins and take the mean to get a pandas.Series indexed by the bins

    sf.slice(bins).mean()
    

    The result looks like this

    (2019-03-01 10:14:05, 2019-03-01 10:29:05]    9.38
    (2019-03-01 10:29:05, 2019-03-01 10:44:05]    9.38
    (2019-03-01 10:44:05, 2019-03-01 10:59:05]    9.38
    (2019-03-01 10:59:05, 2019-03-01 11:14:05]    9.38
    (2019-03-01 11:14:05, 2019-03-01 11:29:05]    9.38
                                                ... 
    (2019-08-27 16:29:05, 2019-08-27 16:44:05]    9.86
    (2019-08-27 16:44:05, 2019-08-27 16:59:05]    9.86
    (2019-08-27 16:59:05, 2019-08-27 17:14:05]    9.86
    (2019-08-27 17:14:05, 2019-08-27 17:29:05]    9.86
    (2019-08-27 17:29:05, 2019-08-27 17:44:05]    9.86
    Length: 17214, dtype: float64
    

    Note this will take a bit of computation time. Averaging across 17214 is not a vectorised calculation unfortunately.

    updated solution

    The solution above doesn't solve the problem @ast was intending. The solution below will.

    After setup of dataframe, and conversion to timestamp etc create a column in your dataframe which represents the energy per hour rate

    interval_hrs = (df["Stop"] - df["Start"])/pd.Timedelta("1hr")
    df["TotalEnergyRate"] = df["TotalEnergy"]/interval_hrs
    

    This time we'll use the value of TotalEnergyRate to create the step function

    sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergyRate")
    

    What you have is a step function, whose value at any time will be the energy rate. If you want to know how much energy is consumed in a particular time period then the result is an integration. We'll slice as before, but this time call integral. This gives you a Series of pandas.Timedelta which you can then divide through by a 1 hour timedelta to convert back from Timedeltas to achieve the answer you are after.

    times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
    bins = pd.IntervalIndex.from_breaks(times)
    sf.slice(bins).integral()/pd.Timedelta("1hr")
    

    Disclaimer (required by stackoverflow): I am the creator of the staircase package.