I'd like to allocate a random zone to all the elements in a data frame.
Say data frame has the structure:
df:
age-height-nation - zone
13,'tall','American', -
.....
11,'tall','S.american', -
and I want to fill the column [zone], being the possible values for zone ('A','B','C'). The probabilities of each zone vary. For ex:
prob(A)=0.1
prob(B)=0.3
prob(C)=0.6
How could I allocate a zone to all elements in df, being probabilities as said?
Thanks in advance, p.
This should do it:
df$zone <- sample(LETTERS[1:3], nrow(df), replace = TRUE, prob = c(0.1, 0.3, 0.6))
You can replcae LETTERS[1:3]
with c("A", "B", "C")
or whatever strings you want.