Search code examples
runiquerepeatunique-id

How to assign unique values (e.g. increasing numbers) to repeating column values in R?


I have a problem solving the following question:

the values start and end are repeated in every other rows. I would like to mark them with a unique number, that increases every time a new start value is reached.

image showcasing the problem and desired ID

I also want rows where the obj column is empty and not inbetween a start and an end, to remain empty in the ID column.

Thanks a lot in advance!


Solution

  • There might be a more streamlined way, but you can use cumsum() with some logic.

    library(dplyr)
    library(tidyr)
    
    x <- c("start", "end", NA)
    df <- tibble(obj = x[c(1, 2, 1, 3, 2, 3, 3, 1, 3, 2, 1, 3, 2, 3, 3, 1, 3, 2)])
    
    df %>% 
      mutate(ID = cumsum(replace_na(obj == "start", 0)),
             ID = if_else(ID == cumsum(replace_na(obj == "end", 0)) & is.na(obj), NA_integer_, ID))
    
    # A tibble: 18 x 2
       obj      ID
       <chr> <int>
     1 start     1
     2 end       1
     3 start     2
     4 NA        2
     5 end       2
     6 NA       NA
     7 NA       NA
     8 start     3
     9 NA        3
    10 end       3
    11 start     4
    12 NA        4
    13 end       4
    14 NA       NA
    15 NA       NA
    16 start     5
    17 NA        5
    18 end       5