Search code examples
rduplicatesrename

Renaming duplicate strings in R


I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have the duplicated strings renamed with ordered suffixes, like in the Column1.new

 Column1   Column2   Column1.new
 1         A         1_1
 1         B         1_2
 2         C         2_1
 2         D         2_2
 3         E         3
 4         F         4

Any ideas of how to do this would be appreciated.

Cheers,

Antti


Solution

  • Let's say your data (ordered by Column1) is within an object called tab. First create a run length object

    c1.rle <- rle(tab$Column1)
    c1.rle
    ##lengths: int [1:4] 2 2 1 1
    ##values : int [1:4] 1 2 3 4
    

    That gives you values of Column1 and the according number of appearences of each element. Then use that information to create the new column with unique identifiers:

    tab$Column1.new <- paste0(rep(c1.rle$values, times = c1.rle$lengths), "_",
            unlist(lapply(c1.rle$lengths, seq_len)))
    

    Not sure, if this is appropriate in your situation, but you could also just paste together Column1 and Column2, to create an unique identifier...