Suppose I have the following data frame:
sectoral_data <- data.frame(sector=c("a","b","c","d"),share=c(0.5,0.3,0.1,0.1),avg_wage=c(400,600,800,1000))
where "share" is the employment share in each sector. I want simulate (I guess that's the right word) the following data frame that would represent a sample of ten individuals from that economy:
personal_data <- data.frame(individual=c(1:10),
wage=c(rep.int(400,5),rep.int(600,3),rep.int(800,1), rep.int(1000,1)),
sector=c(rep("a",5),rep("b",3), rep("c",1), rep("d",1))
)
Any idea of an efficient way to do this and/or if there is a built in feature?
You can use sample
:
n <- 10
with(sectoral_data,
data.frame(
individual = seq_len(n),
wage = sample(avg_wage, size = n, replace = TRUE, prob = share),
sector = sample(sector, size = n, replace = TRUE, prob = share)
))
# individual wage sector
#1 1 400 c
#2 2 600 c
#3 3 800 a
#4 4 800 b
#5 5 400 b
#6 6 400 a
#7 7 400 b
#8 8 600 c
#9 9 400 a
#10 10 400 c