Search code examples
rrandomsampling

Allocate random zone to data frame


I'd like to allocate a random zone to all the elements in a data frame.

Say data frame has the structure:

df:
age-height-nation -  zone
13,'tall','American', -
.....
11,'tall','S.american', -

and I want to fill the column [zone], being the possible values for zone ('A','B','C'). The probabilities of each zone vary. For ex:

prob(A)=0.1
prob(B)=0.3
prob(C)=0.6

How could I allocate a zone to all elements in df, being probabilities as said?

Thanks in advance, p.


Solution

  • This should do it:

    df$zone <- sample(LETTERS[1:3], nrow(df), replace = TRUE, prob = c(0.1, 0.3, 0.6))
    

    You can replcae LETTERS[1:3] with c("A", "B", "C") or whatever strings you want.