Search code examples
rdatetimeposixct

Computing the sum of datetime intervals between rows within individuals (R)


I would like to mutate a new column that groups by ID and sum up the intervals between each of the sessions. Interval is defined as the difference between End time and its subsequent Start time. For example, for ID 1, this will be the sum of (2021-07-11 05:55:00 - 2021-07-11 01:14:00 = 281mins) and (2021-07-11 11:08:00 - 2021-07-11 08:09:00 = 179mins) which will be 460.

df <- structure(list(ID = c(1, 1, 1, 2, 2, 2), Start = structure(c(1625931780, 
1625954100, 1625972880, 1625505720, 1625517480, 1625526900), class = c("POSIXct", 
"POSIXt"), tzone = "Singapore"), End = structure(c(1625937240, 
1625962140, 1625981580, 1625513640, 1625523300, 1625531880), class = c("POSIXct", 
"POSIXt"), tzone = "Singapore"), n = c(3L, 3L, 3L, 3L, 3L, 3L
)), row.names = c(NA, 6L), class = "data.frame")

  ID               Start                 End n
1  1 2021-07-10 23:43:00 2021-07-11 01:14:00 3
2  1 2021-07-11 05:55:00 2021-07-11 08:09:00 3
3  1 2021-07-11 11:08:00 2021-07-11 13:33:00 3
4  2 2021-07-06 01:22:00 2021-07-06 03:34:00 3
5  2 2021-07-06 04:38:00 2021-07-06 06:15:00 3
6  2 2021-07-06 07:15:00 2021-07-06 08:38:00 3

Desired:

  ID               Start                 End n sumIntervals
1  1 2021-07-10 23:43:00 2021-07-11 01:14:00 3          460
2  1 2021-07-11 05:55:00 2021-07-11 08:09:00 3          460
3  1 2021-07-11 11:08:00 2021-07-11 13:33:00 3          460
4  2 2021-07-06 01:22:00 2021-07-06 03:34:00 3          124
5  2 2021-07-06 04:38:00 2021-07-06 06:15:00 3          124
6  2 2021-07-06 07:15:00 2021-07-06 08:38:00 3          124

Note: Start and End are in POSIXct and the number of sessions within each ID is not constant, so it can be any number of n. n=3 is used here for illustration.

Any help will be greatly appreciated!


Solution

  • We can take the lead of 'Start' after grouping by 'ID', use difftime to get the difference in 'mins' from the 'End' and sum the integer converted values

    library(dplyr)
    df <- df %>%
          group_by(ID) %>%
          mutate(new = sum(as.integer(difftime(lead(Start), End, 
                 units = 'mins')), na.rm = TRUE) ) %>%
          ungroup
    

    -output

    # A tibble: 6 x 5
         ID Start               End                     n   new
      <dbl> <dttm>              <dttm>              <int> <int>
    1     1 2021-07-10 23:43:00 2021-07-11 01:14:00     3   460
    2     1 2021-07-11 05:55:00 2021-07-11 08:09:00     3   460
    3     1 2021-07-11 11:08:00 2021-07-11 13:33:00     3   460
    4     2 2021-07-06 01:22:00 2021-07-06 03:34:00     3   124
    5     2 2021-07-06 04:38:00 2021-07-06 06:15:00     3   124
    6     2 2021-07-06 07:15:00 2021-07-06 08:38:00     3   124