I am trying to write a code to create a new column that will label a corresponding compound with a number. I have compounds that are repeated in a list and I need those to be labeled with the same number but a letter to separate the compounds. I have no idea how to code this. Thank you, Example below: What currently have:
Fructose 1
Maltose 2
Sucrose 3
Sucrose 4
What want:
Fructose 1
Maltose 2
Sucrose 3
Sucrose 3b
I cannot label each compound by hand because I have such a large dataset.
Here's how I would do this with R and the data.table
package.
First, we will key (and sort) the data by compound
. Then, we will create our own index and add letters to its dupes (though not sure how you want to handle groups larger than 26)
library(data.table)
setkey(setDT(df), compound)[, indx := as.character(.GRP), by = compound]
df[duplicated(df), indx := paste0(indx, letters[seq_len(.N)])]
df
# compound number indx
# 1: Fructose 1 1
# 2: Maltose 2 2
# 3: Sucrose 3 3
# 4: Sucrose 4 3a