Search code examples
rdplyrsplitaggregate

Iterative labelling of a contiguous binary value arranged by date


I'm looking to generate a column in a table that iterates each time there is a state change. It's important this is arranged by date

For example, take the following input data

dummy_d = data.frame(date = seq.POSIXt(from = as.POSIXct("2022-02-14 09:00:00"),
                                   to=as.POSIXct("2022-02-14 09:06:00"),
                                   by = "1 min"),
                 valid = c(1,1,0,0,1,0,0))

The calculated output should look like

dummy_d %>% mutate(group = c(1,1,2,2,3,4,4))

That is to say, arranging by time, each time valid switches from 0 to 1 vice versa, group cumulatively builds.

Any help appreciated


Solution

  • With dplyr alone you could try the following:

    library(dplyr)
    
    dummy_d %>%
      mutate(group = cumsum(valid != lag(valid, default = first(valid))) + 1)
    

    Output

                     date valid group
    1 2022-02-14 09:00:00     1     1
    2 2022-02-14 09:01:00     1     1
    3 2022-02-14 09:02:00     0     2
    4 2022-02-14 09:03:00     0     2
    5 2022-02-14 09:04:00     1     3
    6 2022-02-14 09:05:00     0     4
    7 2022-02-14 09:06:00     0     4