data.table
offers a nice convenience function, rleid
for run-length encoding:
library(data.table)
DT = data.table(grp=rep(c("A", "B", "C", "A", "B"), c(2, 2, 3, 1, 2)), value=1:10)
rleid(DT$grp)
# [1] 1 1 2 2 3 3 3 4 5 5
I can mimic this in base R
with:
df <- data.frame(DT)
rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)
# [1] 1 1 2 2 3 3 3 4 5 5
Does anyone know of a dplyr
equivalent (?) or is the "best" way to create the rleid
behavior with dplyr
is to do something like the following
library(dplyr)
my_rleid = rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)
df %>%
mutate(rleid = my_rleid)
From v1.1.0
dplyr added the function consecutive_id()
modeled after data.table::rleid()
with the same support for multiple vectors and the treatment of NA
values.
library(dplyr)
DT %>%
mutate(id = consecutive_id(grp))
grp value id
1: A 1 1
2: A 2 1
3: B 3 2
4: B 4 2
5: C 5 3
6: C 6 3
7: C 7 3
8: A 8 4
9: B 9 5
10: B 10 5