Search code examples
rdataframedplyrtimemutate

Calculate the time intervals within each cycle using R


I have a data like this:

data<-data.frame(time=c(20230404001040, 20230404001050,20230404001100, 20230404001110, 20230404001120,20230404001130,
                        20230404001140,20230404001150,20230404001200),
                 on=c("FALSE", "FALSE", "FALSE", "TRUE","TRUE","TRUE","FALSE","FALSE","FALSE"))

'time' is written as ymd_hms representation. I think I can use data[,1] <- ymd_hms(data[,1]). If on is FALSE, it means that the switch is off. If on is TRUE, it means that the switch is on.

I want to calculate the duration time of each on/off event. Each row of time is 10-second interval. So I can count how many rows within each on/off event and multiply to 10. So my desired output should look like this:

data<-data.frame(time=c(20230404001040, 20230404001050,20230404001100, 20230404001110, 20230404001120,20230404001130,
                        20230404001140,20230404001150,20230404001200),
                 on=c("FALSE", "FALSE", "FALSE", "TRUE","TRUE","TRUE","FALSE","FALSE","FALSE"),
                 time_after_switch=c(0,10,20,0,10,20,0,10,20))

For my data first 3 rows are switch off event, next 3 rows are switch on event, finally last 3 rows are switch off event. So I can think of it as 3 cycles. Within each cycle, the duration times are 0,10,20,0,10,20,0,10,20. I want to make r code calculating the values of time_after_switch.


Solution

  • one approach (using the actual time spans between log entries):

    ## helper function to uniquely label blocks
    ## of continuous state for later groupwise
    ## duration summing:
    
    get_block_labels <- function(xs){
      rls <- rle(xs)$lengths
      rep(1:length(rls), times = rls)
    }
    
    library(dplyr)
    
    data |>
      arrange(time) |>
      mutate(time = time |> as.character() |>  ymd_hms(),
             dt = (time - lag(time, default = time[1])) |> as.integer(),
             block = get_block_labels(on)
             ) |>
      group_by(block) |>
      mutate(dur = cumsum(dt))
    

    output:

    + # A tibble: 9 x 5
    # Groups:   block [3]
      time                on       dt block   dur
      <dttm>              <chr> <int> <int> <int>
    1 2023-04-04 00:10:40 FALSE     0     1     0
    2 2023-04-04 00:10:50 FALSE    10     1    10
    3 2023-04-04 00:11:00 FALSE    10     1    20
    4 2023-04-04 00:11:10 TRUE     10     2    10
    5 2023-04-04 00:11:20 TRUE     10     2    20
    6 2023-04-04 00:11:30 TRUE     10     2    30
    7 2023-04-04 00:11:40 FALSE    10     3    10
    8 2023-04-04 00:11:50 FALSE    10     3    20
    9 2023-04-04 00:12:00 FALSE    10     3    30