Search code examples
rrowssample

Select rows with two different probabilities within a R data frame


I have this simulation where I want to generate rows according to a given condition and probability.

I generate the data with this code:

library(tidyr)
df=data.frame(replicate(6,sample(1:10,1000,rep=TRUE)))

now, I want to select rows with rowMeans larger or equal to 6 with a 0.8 probability and rows whith rowMeans < 6 with a 0.2 probability. I am using this code to select a sample of n=30 with from the original df with rows with rowmean >6:

library(fBasics)
xsample=pop.dataL %>% dplyr::filter(rowSkewness(pop.dataL)>1.5) %>% 
dplyr::sample_n(30, weight=c(2,8), replace=T)

but of course I am getting the error "incorrect number of probabilities"...because I need to have a vector with the weights with equal numbers of nrow(df)... just can't figure it out...

Any help will be appreciated...

Thanks!


Solution

  • Use ifelse() to allocate the probabilities.

    df %>%
      sample_n(30, replace = T, weight = ifelse(rowMeans(df) >= 6, 8, 2))