Search code examples
rdplyrsamplerowwise

sample with dplyr and rowwise


Here is my example:

library(dplyr)

n_experiments <- 1000
a <- sample(1:3, n_experiments, replace = T)
b <- sample(1:3, n_experiments, replace = T)


my_df <- data.frame(a = a, b= b)
set.seed(7);my_df <- my_df %>% rowwise() %>% 
  mutate(col_1 = sample(setdiff(c(1,2,3), unique(c(a,b ))),1),
         col_2 = sample(setdiff(c(1,2,3), unique(c(a,b ))),1),
         set =I(list(unique(c(a,b )))),
         set_diff = I(list(setdiff(c(1,2,3), unique(c(a,b ))))),
  )

Unfortunately, I do not know how to make everyone to reproduce the same example, but here is what I get on my computer as output

df

The very first row shows that col_1 and col_2 are different, while I expect them to be the same. Moreover, I expect col_1 and col_2 be sampled from set_diff column. Could anybody help me to clarify my mistake?


Solution

  • The very first row shows that col_1 and col_2 are different, while I expect them to be the same.

    set.seed(7) makes sure that every time you run your script, it will create the same my_df. It does not mean that every single time you run sample, it will sample the same number, so col_1 and col_2 do not need to be the same. However, if you run your code twice, both will get you the same col_1.

    I expect col_1 and col_2 be sampled from set_diff column.

    From the documentation of sample: If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Therefore, if set_diff equals 3, a sample is drawn from c(1,2,3).