This an example of my data:
df <- data.frame(dyad = c("a", "a", "b", NA, "c", NA, "c", "b"))
df
# dyad
# 1 a
# 2 a
# 3 b
# 4 <NA>
# 5 c
# 6 <NA>
# 7 c
# 8 b
I want to create an index for runs consecutive runs of dyad
that are the same.
Note 1: dyad
might be repeated throught the dataframe, but should always have a new unique label if not consecutive to the previous rows in which dyad
is the same. E.g. the "b" on row 3 and 8 should have different id.
Note 2: identical dyad
before and after NA
should have different id. E.g. the "c" before and after the last NA
should have a different id.
Thus, the expected result is:
# dyad event
# 1 a 1
# 2 a 1
# 3 b 2
# 4 <NA> NA
# 5 c 3
# 6 <NA> NA
# 7 c 4
# 8 b 5
Any insight in how to make it work or advice are welcome!
Using rleid
from data.table
and cumsum
.
library(data.table)
df$event <- rleid(df$dyad) - cumsum(is.na(df$dyad))
df$event[is.na(df$dyad)] <- NA
df
# dyad event
#1 a 1
#2 a 1
#3 b 2
#4 <NA> NA
#5 c 3
#6 <NA> NA
#7 c 4
#8 b 5
Well the above solution does not work when you have consecutive NA
's, in that case we can use :
x = c("a", NA, NA, "a", "b", "b", "c", NA)
y <- cumsum(!duplicated(rleid(x)) & !is.na(x))
y[is.na(x)] <- NA
y
#[1] 1 NA NA 2 3 3 4 NA