Search code examples
rdplyrunique

How to assign a unique number based on multiple columns in R dataset?


I collected some data that is different in unique in year, month, and level. I want to assign a unique code (simple numerics) to each row on these three columns alone. Any suggestions on how to proceed?

year <- c("A","J","J","S")
month <- c(2000,2001,2001,2000)
level <- c("high","low","low","low")
site <- c(1,2,3,3)
val1 <- c(1,2,3,0)

df <- data.frame(year,month,level,site,val1)

#Result desired
df$Unique.code --> 1,2,2,3

Solution

  • dplyr has the cur_group_id() function for this:

    df %>%
      group_by(year, month, level) %>%
      mutate(id = cur_group_id())
    # # A tibble: 4 × 6
    # # Groups:   year, month, level [3]
    #   year  month level  site  val1    id
    #   <chr> <dbl> <chr> <dbl> <dbl> <int>
    # 1 A      2000 high      1     1     1
    # 2 J      2001 low       2     2     2
    # 3 J      2001 low       3     3     2
    # 4 S      2000 low       3     0     3
    

    Or we could coerce a factor into an integer in base:

    df$group_id = with(df, as.integer(factor(paste(year, month, level))))