Simulating a representative dataset in R

Suppose I have the following data frame:

sectoral_data <- data.frame(sector=c("a","b","c","d"),share=c(0.5,0.3,0.1,0.1),avg_wage=c(400,600,800,1000))

where "share" is the employment share in each sector. I want simulate (I guess that's the right word) the following data frame that would represent a sample of ten individuals from that economy:

personal_data <- data.frame(individual=c(1:10),
                          wage=c(rep.int(400,5),rep.int(600,3),rep.int(800,1), rep.int(1000,1)),
                          sector=c(rep("a",5),rep("b",3), rep("c",1), rep("d",1))
                          )

Any idea of an efficient way to do this and/or if there is a built in feature?

Solution

You can use sample:

n <- 10

with(sectoral_data,
  data.frame(
    individual = seq_len(n),
    wage = sample(avg_wage, size = n, replace = TRUE, prob = share),
    sector = sample(sector, size = n, replace = TRUE, prob = share)
  ))
#   individual wage sector
#1           1  400      c
#2           2  600      c
#3           3  800      a
#4           4  800      b
#5           5  400      b
#6           6  400      a
#7           7  400      b
#8           8  600      c
#9           9  400      a
#10         10  400      c