Search code examples
rdplyrdifftime

Calculating conditional cumulative time


Following the pointers from this question.

I'd like to calculate the cumulative time for all the Cats, by considering their respective last toggle status.

EDIT: I'd also want to check if the FIRST Toggle status of a Cat is Off and if it is so, for that specific cat, the time from midnight 00:00:00 till this first FIRST Off time should be added to its total conditional cumulative ontime.

Sample data:

       Time Cat Toggle
1  05:12:09  36 On
2  05:12:12 26R Off # First Toggle of this Cat happens to be Off, Condition met
3  05:12:15 26R On
4  05:12:16 26R Off
5  05:12:18  99 Off # Condition met
6  05:12:18  99 On
7  05:12:24  36 Off
8  05:12:26  36 On
9  05:12:29  80 Off # Condition met
10 05:12:30  99 Off
11 05:12:31  95 Off # Condition met
12 05:12:32  36 Off

Desired sample output:

  Cat Time(Secs)
1 36  21
2 26R 18733 # (=1+18732), 18732 secs to be added = total Sec from midnight till 05:12:12
3 99  18750 # (=12+18738), 18738 secs to be added = total Sec from midnight till 05:12:18
4 ..  ..

Any sort of help is appreciated.


Solution

  • A possible solution using :

    # load the 'data.table'-package, convert 'df' to a 'data.table'
    # and 'Time'-column to a time-format
    library(data.table)
    setDT(df)[, Time := as.ITime(Time)]
    
    # calculate the time-difference
    df[, .(time.diff = sum((shift(Time, type = 'lead') - Time) * (Toggle == 'On'), na.rm = TRUE))
       , by = Cat]
    

    which gives:

       Cat time.diff
    1:  36        21
    2: 26R         1
    3:  99        12
    4:  80         0
    5:  95         0
    

    In respons to your question in the comments, you could do:

    # create a new data.table with midnigth times for the categories where
    # the first 'Toggle' is on "Off"
    df0 <- df[, .I[first(Toggle) == "Off"], by = Cat
              ][, .(Time = as.ITime("00:00:00"), Cat = unique(Cat), Toggle = "On")]
    
    # bind that to the original data.table; order on 'Cat' and 'Time'
    # and then do the same calculation
    rbind(df, df0)[order(Cat, Time)
                   ][, .(time.diff = sum((shift(Time, type = 'lead') - Time) * (Toggle == 'On'), na.rm = TRUE))
                                     , by = Cat]
    

    which gives:

       Cat time.diff
    1: 26R     18733
    2:  36        21
    3:  80     18749
    4:  95     18751
    5:  99     18750
    

    An alternative with base R (only original question):

    df$Time <- as.POSIXct(df$Time, format = "%H:%M:%S")
    
    stack(sapply(split(df, df$Cat),
                 function(x) sum(diff(x[["Time"]]) * (head(x[["Toggle"]],-1) == 'On'))))
    

    which gives:

      values ind
    1      1 26R
    2     21  36
    3      0  80
    4      0  95
    5     12  99
    

    Or with the (only original question):

    library(dplyr)
    library(lubridate)
    
    df %>% 
      mutate(Time = lubridate::hms(Time)) %>% 
      group_by(Cat) %>% 
      summarise(time.diff = sum(diff(Time) * (head(Toggle, -1) == 'On'),
                                na.rm = TRUE))