Add index to runs of equal values, accounting for NA

This an example of my data:

df <- data.frame(dyad = c("a", "a", "b", NA, "c", NA, "c", "b"))
df
#   dyad
# 1    a
# 2    a
# 3    b
# 4 <NA>
# 5    c
# 6 <NA>
# 7    c
# 8    b

I want to create an index for runs consecutive runs of dyad that are the same.

Note 1: dyad might be repeated throught the dataframe, but should always have a new unique label if not consecutive to the previous rows in which dyad is the same. E.g. the "b" on row 3 and 8 should have different id.

Note 2: identical dyad before and after NA should have different id. E.g. the "c" before and after the last NA should have a different id.

Thus, the expected result is:

#   dyad event
# 1    a     1
# 2    a     1
# 3    b     2
# 4 <NA>    NA
# 5    c     3
# 6 <NA>    NA
# 7    c     4
# 8    b     5

Any insight in how to make it work or advice are welcome!

Solution

Using rleid from data.table and cumsum.

library(data.table)

df$event <- rleid(df$dyad) - cumsum(is.na(df$dyad))
df$event[is.na(df$dyad)] <- NA
df

#  dyad event
#1    a     1
#2    a     1
#3    b     2
#4 <NA>    NA
#5    c     3
#6 <NA>    NA
#7    c     4
#8    b     5

Well the above solution does not work when you have consecutive NA's, in that case we can use :

x = c("a", NA, NA, "a", "b", "b", "c", NA)
y <- cumsum(!duplicated(rleid(x)) & !is.na(x))
y[is.na(x)] <- NA
y
#[1]  1 NA NA  2  3  3  4 NA