Search code examples
rdplyrseqrowwise

Detect sequences rowwise


Please see the toy data to create the column "check" (solution). If there are 3 values in sequence (from 2018 to 2021, rowwise) which are >=20, the value should be TRUE otherwise FALSE.

A dplyr solution is preferred. The original dataset have hundreds of cols and thousands of rows. NAs could be anywhere.

test<-data.frame(country=c("US","UK","RU","GR","BE"),
             "y_2018"=c(NA,30,20,40,10),
             "y_2019"=c(10,10,20,20,20),
             "y_2020"=c(20,NA,30,20,20),
             "y_2021"=c(NA,70,10,10,NA),
             check=c(F,F,T,T,F))

Solution

  • Here's a way using rle -

    library(dplyr)
    
    test %>%
      rowwise() %>%
      mutate(check = {
        tmp <- rle(c_across(starts_with('y')) >= 20)
        any(tmp$lengths[tmp$values] >= 3, na.rm = TRUE)
      }) %>%
      ungroup
    
    # country y_2018 y_2019 y_2020 y_2021 check
    #  <chr>    <dbl>  <dbl>  <dbl>  <dbl> <lgl>
    #1 US          NA     10     20     NA FALSE
    #2 UK          30     10     NA     70 FALSE
    #3 RU          20     20     30     10 TRUE 
    #4 GR          40     20     20     10 TRUE 
    #5 BE          10     20     20     NA FALSE