Search code examples
rdplyrtidyverse

How to assign custom IDs for consecutive similar entries with exceptions?


I want custom row numbers such that if consecutive As or Bs appear but not Cs then the same ID will be repeated something like below:

Required

   X ID
1  A  1
2  B  2
3  A  3
4  A  3
5  B  4
6  B  4
7  B  4
8  C  5
9  C  6
10 B  7

Attempted

library(tidyverse)
df1 <-
  data.frame(X = c("A", "B", "A", "A", "B", "B", "B", "C", "C", "B"))

df1 %>% 
  mutate(ID = row_number())
#>    X ID
#> 1  A  1
#> 2  B  2
#> 3  A  3
#> 4  A  4
#> 5  B  5
#> 6  B  6
#> 7  B  7
#> 8  C  8
#> 9  C  9
#> 10 B 10

Solution

  • Usually we'd use rle,

    > transform(df1, ID=with(rle(X), rep(seq_along(values), lengths)))
       X ID
    1  A  1
    2  B  2
    3  A  3
    4  A  3
    5  B  4
    6  B  4
    7  B  4
    8  C  5
    9  C  5
    10 B  6
    

    (note, that these are not "row numbers")

    but for using consecutive numbers in "C" you could use a temporary column like so.

    > df1 |> 
    +   within({
    +     .tmp <- X
    +     .tmp[.tmp == 'C'] <- seq_along(.tmp[.tmp == 'C'])
    +     ID <- with(rle(.tmp), rep(seq_along(values), lengths))
    +     rm(.tmp)
    +   })
       X ID
    1  A  1
    2  B  2
    3  A  3
    4  A  3
    5  B  4
    6  B  4
    7  B  4
    8  C  5
    9  C  6
    10 B  7