Search code examples
rdataframesortingduplicatesmultiple-columns

Create numerical discrete values if values in a column equal in R


I have a column of IDs in a dataframe that sometimes has duplicates, take for example,

ID
209
315
109
315
451
209

What I want to do is take this column and create another column that indicates what ID the row belongs to. i.e. I want it to look like,

ID ID Category
209 1
315 2
109 3
315 2
451 4
209 1

Essentially, I want to loop through the IDs and if it equals to a previous one, I indicate that it is from the same ID, and if it is a new ID, I create a new indicator for it.

Does anyone know is there a quick function in R that I could do this with? Or have any other suggestions?


Solution

  • Convert to factor with levels ordered with unique (order of appearance in the data set) and then to numeric:

    data$IDCategory <- as.numeric(factor(data$ID, levels = unique(data$ID)))
    
    #> data
    #   ID IDCategory
    #1 209          1
    #2 315          2
    #3 109          3
    #4 315          2
    #5 451          4
    #6 209          1