Search code examples
rdplyruniquedistinct

R: Collapse duplicated values in a column while keeping the order


I'm sure this is super simple but just can't find the answer. I have a data frame like so

    Id  event
1   1   A
2   1   B
3   1   A
4   1   A
5   2   C
6   2   C
7   2   A

And I'd like to group by Id and collapse the distinct event values while keeping the event order like so

    Id  event
1   1   A
2   1   B
3   1   A
4   2   C
5   2   A

Most of my searches end up with using the distinct() or unique() functions but that leads losing the A event in row 3 for Id 1.

Thanks in advance!


Solution

  • We can use lead to compare each row and filter those rows that are different than the previous ones. is.na(lead(Id)) is to also include the last rows.

    library(dplyr)
    
    dat2 <- dat %>% 
      filter(!(Id == lead(Id) & event == lead(event)) | is.na(lead(Id)))
    dat2
    #   Id event
    # 1  1     A
    # 2  1     B
    # 3  1     A
    # 4  2     C
    # 5  2     A
    

    DATA

    dat <- read.table(text = "    Id  event
    1   1   A
                      2   1   B
                      3   1   A
                      4   1   A
                      5   2   C
                      6   2   C
                      7   2   A",
                      header = TRUE, stringsAsFactors = FALSE)