Search code examples
rfor-loopsample

How to sample discrete distribution conditional on factor within for if loop


I'm attempting to generate dummy data by sampling from a specific discrete distribution - conditional on the levels of a factor (so a different distribution for each factor level) and then wish to insert each random result into a new dataframe column in the row corresponding to the factor level. If you run the code below you will see that 'data$last' is empty. I'm not sure what I'm doing wrong, I've tried it without the loop as well, by setting replications to 100 for each level - however the distributions are incorrect.

#Create data frame with factor 
set.seed(1)
ID<-(1:200)
gender<-sample(x = c("Male","Female"), 200, replace = T, prob = c(0.5, 0.5))
data<-data.frame(ID,gender)

#Generate random response based on discrete distribution conditional on gender
data$last <- for (i in 1:nrow(data)) {if(data$gender=="Male") {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
}

Solution

  • You should rewrite your for-loop to assignate each data$last value inside the loop :

    for (i in 1:nrow(data)) {
      if(data$gender[i]=="Male") {
        data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
      } else {
        data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
      }
    }
    

    Or without for-loop :

    data$last = ifelse(data$gender=="Male", 
                   sample(x = c("Today","Yesterday"), length(data$gender[(data$gender=="Male")==TRUE]), replace = T, prob = c(0.8, 0.2)), 
                   sample(x = c("Today","Yesterday"), length(data$gender[(data$gender!="Male")==TRUE]), replace = T, prob = c(0.3, 0.7)))