Search code examples
rsasrenaming

Renaming duplicate variables in SAS or R


I am trying to write a code to create a new column that will label a corresponding compound with a number. I have compounds that are repeated in a list and I need those to be labeled with the same number but a letter to separate the compounds. I have no idea how to code this. Thank you, Example below: What currently have:

Fructose 1
Maltose  2
Sucrose  3
Sucrose  4

What want:

    Fructose 1
    Maltose 2
    Sucrose 3
    Sucrose 3b

I cannot label each compound by hand because I have such a large dataset.


Solution

  • Here's how I would do this with R and the data.table package.

    First, we will key (and sort) the data by compound. Then, we will create our own index and add letters to its dupes (though not sure how you want to handle groups larger than 26)

    library(data.table)
    setkey(setDT(df), compound)[, indx := as.character(.GRP), by = compound]
    df[duplicated(df), indx := paste0(indx, letters[seq_len(.N)])]
    df
    #    compound number indx
    # 1: Fructose      1    1
    # 2:  Maltose      2    2
    # 3:  Sucrose      3    3
    # 4:  Sucrose      4   3a