I have this simulation where I want to generate rows according to a given condition and probability.
I generate the data with this code:
library(tidyr)
df=data.frame(replicate(6,sample(1:10,1000,rep=TRUE)))
now, I want to select rows with rowMeans
larger or equal to 6 with a 0.8 probability and rows whith rowMeans
< 6 with a 0.2 probability. I am using this code to select a sample of n=30 with from the original df with rows with rowmean >6:
library(fBasics)
xsample=pop.dataL %>% dplyr::filter(rowSkewness(pop.dataL)>1.5) %>%
dplyr::sample_n(30, weight=c(2,8), replace=T)
but of course I am getting the error "incorrect number of probabilities"...because I need to have a vector with the weights with equal numbers of nrow(df)... just can't figure it out...
Any help will be appreciated...
Thanks!
Use ifelse()
to allocate the probabilities.
df %>%
sample_n(30, replace = T, weight = ifelse(rowMeans(df) >= 6, 8, 2))