Search code examples
rdataframeunique

How to assign a unique ID number to each group of identical values in a column


I have a data frame with a number of columns. I would like to create a new column called “id” that gives a unique id number to each group of identical values in the “sample” column.

Example data:

df <- data.frame(
  index = 1:30,
  val = c(
    14L, 22L, 1L, 25L, 3L, 34L, 35L, 36L, 24L, 35L, 33L, 31L, 30L,
    30L, 29L, 28L, 26L, 12L, 41L, 36L, 32L, 37L, 56L, 34L, 23L, 24L,
    28L, 22L, 10L, 19L
  ),
  sample = c(
    5L, 6L, 6L, 7L, 7L, 7L, 8L, 9L, 10L, 11L, 11L, 12L, 13L, 14L,
    14L, 15L, 15L, 15L, 16L, 17L, 18L, 18L, 19L, 19L, 19L, 20L, 21L,
    22L, 23L, 23L
  )
)

What I would like to end up with:

  index val sample id
1     1  14      5  1
2     2  22      6  2
3     3   1      6  2
4     4  25      7  3
5     5   3      7  3
6     6  34      7  3

Solution

  • How about

    df2 <- transform(df,id=as.numeric(factor(sample)))
    

    ?

    I think this (cribbed from Add ID column by group) should be slightly more efficient, although perhaps a little harder to remember:

    df3 <- transform(df, id=match(sample, unique(sample)))
    all.equal(df2,df3)  ## TRUE
    

    If you want to do this in tidyverse:

    library(dplyr)
    df %>% group_by(sample) %>% mutate(id=cur_group_id())