I'm attempting to generate dummy data by sampling from a specific discrete distribution - conditional on the levels of a factor (so a different distribution for each factor level) and then wish to insert each random result into a new dataframe column in the row corresponding to the factor level. If you run the code below you will see that 'data$last' is empty. I'm not sure what I'm doing wrong, I've tried it without the loop as well, by setting replications to 100 for each level - however the distributions are incorrect.
#Create data frame with factor
set.seed(1)
ID<-(1:200)
gender<-sample(x = c("Male","Female"), 200, replace = T, prob = c(0.5, 0.5))
data<-data.frame(ID,gender)
#Generate random response based on discrete distribution conditional on gender
data$last <- for (i in 1:nrow(data)) {if(data$gender=="Male") {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
}
You should rewrite your for-loop to assignate each data$last value inside the loop :
for (i in 1:nrow(data)) {
if(data$gender[i]=="Male") {
data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
}
Or without for-loop :
data$last = ifelse(data$gender=="Male",
sample(x = c("Today","Yesterday"), length(data$gender[(data$gender=="Male")==TRUE]), replace = T, prob = c(0.8, 0.2)),
sample(x = c("Today","Yesterday"), length(data$gender[(data$gender!="Male")==TRUE]), replace = T, prob = c(0.3, 0.7)))