Search code examples
pythonpandasdatetimelogic

Pandas, datetime & logic: Summarizing the sum of specific rows in a column


I got a DataFrame:

    date        start       end         inter
0   01-09-2020  10:01:24    10:02:13    0 days 00:00:49
1   01-09-2020  10:04:21    10:22:01    0 days 00:17:40
2   01-09-2020  10:16:14    10:25:06    0 days 00:08:52
3   01-09-2020  10:28:38    10:28:40    0 days 00:00:02
4   01-09-2020  10:37:38    10:37:41    0 days 00:00:03
... ... ... ... ...
995 17-09-2020  12:19:03    12:21:06    0 days 00:02:03
996 17-09-2020  12:22:53    12:22:58    0 days 00:00:05
997 17-09-2020  12:25:11    12:25:14    0 days 00:00:03
998 17-09-2020  12:27:07    12:27:08    0 days 00:00:01
999 17-09-2020  12:29:03    12:29:05    0 days 00:00:02
1000 rows × 4 columns

I wanna to create a new df, but with the sum of 'inter' in a specific date time range. For example:

    new_date    start_range end_range   inter_sum
0   01-09-2020  10:00:00    10:59:59    0 days 01:15:36 
1   01-09-2020  11:00:00    11:59:59    0 days 00:58:30
...
997 17-09-2020  10:00:00    10:59:59    0 days 03:00:15
998 17-09-2020  11:00:00    11:59:59    0 days 00:47:20

Where 'inter_sum' is the sum of 'inter' values between the 'start_range' and 'end_range' based on 'start' and 'end' of previous df.


Solution

  • Try with resample:

    #convert date column to datetime if needed
    df["date"] = pd.to_datetime(df["date"])
    
    #convert time column to datetime for resample
    df["start"] = pd.to_datetime(df["date"].astype(str)+"T"+df["start"].astype(str))
    
    #resample on start datetime every hour and sum
    output = df.resample("H", on="start")["inter"].sum()
    
    #formatting to match expected output
    output["date"] = output["start"].dt.date
    output["end"] = (output["start"] + pd.Timedelta(minutes=59, seconds=59)).dt.time
    output["start"] = output["start"].dt.time
    output = output[["date", "start", "end", "inter_sum"]]
    
    >>> output
                date     start       end       inter_sum
    0     2020-01-09  10:00:00  10:59:59 0 days 00:27:26
    1     2020-01-09  11:00:00  11:59:59 0 days 00:00:00
    2     2020-01-09  12:00:00  12:59:59 0 days 00:00:00
    3     2020-01-09  13:00:00  13:59:59 0 days 00:00:00
    4     2020-01-09  14:00:00  14:59:59 0 days 00:00:00
             ...       ...       ...             ...
    6046  2020-09-17  08:00:00  08:59:59 0 days 00:00:00
    6047  2020-09-17  09:00:00  09:59:59 0 days 00:00:00
    6048  2020-09-17  10:00:00  10:59:59 0 days 00:00:00
    6049  2020-09-17  11:00:00  11:59:59 0 days 00:00:00
    6050  2020-09-17  12:00:00  12:59:59 0 days 00:02:14