Search code examples
rdplyrtidyverse

Generate a random variable by id in R


I want to create a random ID variable considering an actual ID. That means that observations with the same id must have the same random ID. Let me put an example:

id  var1var2
1   a   1
5   g   35
1   hf  658
2   f   576
9   d   54546
2   dg  76
3   g   5
3   g   5
5   gg  56
6   g   456
8v  g   6
9   e   778795

The expected result is:

id  var1var2id random
1   a   1   9
5   g   35  1
1   hf  658 9
2   f   576 8
9   d   54546   3
2   dg  76  8
3   g   5   7
3   g   5   7
5   gg  56  1
6   g   456 5
8v  g   6   4
9   e   778795  3

Solution

  • To create a new id by group, use match with sample, or cur_group_id in dplyr. The ids will start from 1 until the number of total groups is reached.

    Base R

    dat$random_id <- match(dat$id, sample(unique(dat$id)))
    

    dplyr

    library(dplyr)
    dat %>%
      group_by(id = factor(id, levels = sample(unique(id)))) %>%
      mutate(random_id = cur_group_id())
    

    output

       id    var1    var2 random_id
     1 1     a          1         6
     2 5     g         35         2
     3 1     hf       658         6
     4 2     f        576         4
     5 9     d      54546         5
     6 2     dg        76         4
     7 3     g          5         7
     8 3     g          5         7
     9 5     gg        56         2
    10 6     g        456         1
    11 8     g          6         3
    12 9     e     778795         5