Search code examples
rdplyrtidyversesequence

How to create a sequency by group from a specific string in R?


I would like to create a sequence of numbers within a group but starting from a specific string.

In this example, If the string matches UNP then sequence (seq column) should start from the next row.

ColA Colb Seq
A HM 0
A RES 0
A UNP 0
A RES 1
A RES 2
A HM 3
B HM 0
B RES 0
B UNP 0
B RES 1
B UNP 2
C UNP 0

Only 1st instance of UNP should be considered not every instance on UNP for each group


Solution

  • You can first create a column specifying the first occurrence of "UNP", then use cumsum() and lag() to calculate the Seq column.

    library(dplyr)
    
    df <- read.table(header = T, text = "
    ColA    Colb    Seq
    A   HM  0
    A   RES 0
    A   UNP 0
    A   RES 1
    A   RES 2
    A   HM  3
    B   HM  0
    B   RES 0
    B   UNP 0
    B   RES 1
    B   UNP 2
    C   UNP 0") %>% 
      select(-Seq)
    
    df %>% 
      group_by(ColA, Colb) %>% 
      mutate(seq_count = ifelse(first(Colb) == "UNP" & !duplicated(Colb), 1, 0)) %>% 
      group_by(ColA) %>% 
      mutate(Seq = lag(cumsum(cumsum(seq_count)), default = 0), .keep = "unused")
    #> # A tibble: 12 × 3
    #> # Groups:   ColA [3]
    #>    ColA  Colb    Seq
    #>    <chr> <chr> <dbl>
    #>  1 A     HM        0
    #>  2 A     RES       0
    #>  3 A     UNP       0
    #>  4 A     RES       1
    #>  5 A     RES       2
    #>  6 A     HM        3
    #>  7 B     HM        0
    #>  8 B     RES       0
    #>  9 B     UNP       0
    #> 10 B     RES       1
    #> 11 B     UNP       2
    #> 12 C     UNP       0
    

    Created on 2022-03-31 by the reprex package (v2.0.1)