Search code examples
rdplyrduplicatesrenamedata-manipulation

Rename duplicated column values group by another column


I have a dataset in which the same id appears in different groups:

df <- read.table(text='id   group  
 1         A 
 2         A 
 2         A 
 1         B 
 1         B 
 2         B
 2         C
 2         C 
 1         C 
 2         D         
 1         D
 1         D', header=TRUE)

I want to rename the duplicated values under column id that are grouped by another column group. The expected output is:

   id  group
    1     A
    2     A
    2     A
    1_2   B
    1_2   B
    2_2   B
    2_3   C
    2_3   C
    1_3   C
    2_4   D
    1_4   D
    1_4   D

How do I do that?


Solution

  • Here is a data.table approach using rleid() to generate a run-length id for each unique id and group combination. We can then just paste() that number to the existing id, where it is >1.

    library(data.table)
    setDT(df)
    
    df[, id_num := rleid(group), id][
        ,
        id := fifelse(
            id_num == 1,
            as.character(id),
            paste(id, id_num, sep = "_")
        )
    ][, `:=`(id_num = NULL)]
    
    print(df)
    
    #         id  group
    #     <char> <char>
    #  1:      1      A
    #  2:      2      A
    #  3:      2      A
    #  4:    1_2      B
    #  5:    1_2      B
    #  6:    2_2      B
    #  7:    2_3      C
    #  8:    2_3      C
    #  9:    1_3      C
    # 10:    2_4      D
    # 11:    1_4      D
    # 12:    1_4      D