Search code examples
rrandomfrequencysample

R: how to sample rows with custom frequencies


I have a data frame in R that has two columns, one with last names, the other with the frequency of each last name. I would like to randomly select last names based on the frequency values (0 -> 1).

So far I have tried using the sample function, but it doesn't allow for specific frequencies for each value. Not sure if this is possible :/


Solution

  • df1 <- data.frame(names = c("John","Mary"),freq=c(0.2,0.8))
    df1
    #   names freq
    # 1  John  0.2
    # 2  Mary  0.8
    
    set.seed(1)
    sample100 <- sample(
      x = df1$names,
      size = 100,
      replace=TRUE,
      prob=df1$freq)
    
    table(sample100)
    # sample100
    # John Mary 
    #   17   83