Search code examples
rrandomsampling

Randomly assign different values to rows using different probability in R


Have such a data frame:

ID var
1  NA
2  NA
3  NA
4  NA
...

I need to randomly assign var values of 20% rows to be A, and 30% rows to be B, and 50% rows to be C.

Is there some efficient way to solve this?


Solution

  • suppose you have dataframe named df: then you can write:

    randvar = sample(c('A','B','C'),size = nrow(df),prob = c(0.2,0.3,0.5),replace = TRUE)
    df$var = randvar
    

    suppose you want the "A"s is rightly 20% percent, so do "B" in 30% and "C" in 50% then it is not one line code, suppose your c(0.2,0.3,0.5)*df_size is all integer my answer is :

    n = nrow(df)
    df$var = "C"  #initialize all value to be "C"
    index = 1:n
    indexa = sample(index,0.2*n)  #pick 20% index for "A"
    indexb = sample(index[-indexa],0.3*n) #pick 30% index for "B" need to rule out the "A"s you already picked
    df$var[indexa] = "A" #assign "A" to df$var at indexa
    df$var[indexb] = "B" #assign "B" to df$var at indexb
    #the rest 50% is "C"