Search code examples
rstringrandomstringi

Generate a unique random string in R using stringi


I have data where each row is a person. I want to make a randomly generated unique ID, so I can identify them in analysis.

Here is a sample dataframe

df <- data.frame(
  gender = rep(c("M", "F", "M", "M", "F"), 1000),
  qtr = sample(c(1:99), 50000, replace = T),
  result = sample(c(100:1000), 50000, replace = T)
)

To generate a unique ID, I am using stringi

library(stringi)
library(magrittr)
library(tidyr)

df <- df %>%
  mutate(UniqueID = do.call(paste0, Map(stri_rand_strings, n=50000, length=c(2, 6),
                                        pattern = c('[A-Z]', '[0-9]'))))

However, when I test to see if the new variable UniqueID is unique, by running this code, I find there are some duplicates.

length(unique(unlist(df[c("UniqueID")])))

Is there a way to generate a unique ID which is truly unique, with no duplicates?

I have seen these questions, but it doesn't answer how to make the random number generated unique. Generating unique random numbers in dataframe column in R Create a dataframe with random numbers in each column

Thanks


Solution

  • You can use the ids package to create unique ID's automatically. For instance, to make 10 million user ID's, you could use:

    randos <- ids::random_id(1E6, 4)
    # The 2nd term here controls how many bytes are assigned to each ID.
    # The default, 16 bytes, makes much longer IDs and crashes my computer
    
    head(randos)
    #[1] "31ca372d" "d462e55f" "2374cc78" "15511574" "ecbf2d65" "236cb2d3"
    

    It has other nice features, like the adjective_animal function, which creates IDs that are easier for humans to distinguish and remember.

    creatures <- ids::adjective_animal(1E6, n_adjectives = 1)
    head(creatures)
    #[1] "yestern_lizard"          "insensible_purplemarten"
    #[3] "cubical_anhinga"         "theophilic_beaver"      
    #[5] "subzero_greyhounddog"    "hurt_weasel"