Here is my example:
library(dplyr)
n_experiments <- 1000
a <- sample(1:3, n_experiments, replace = T)
b <- sample(1:3, n_experiments, replace = T)
my_df <- data.frame(a = a, b= b)
set.seed(7);my_df <- my_df %>% rowwise() %>%
mutate(col_1 = sample(setdiff(c(1,2,3), unique(c(a,b ))),1),
col_2 = sample(setdiff(c(1,2,3), unique(c(a,b ))),1),
set =I(list(unique(c(a,b )))),
set_diff = I(list(setdiff(c(1,2,3), unique(c(a,b ))))),
)
Unfortunately, I do not know how to make everyone to reproduce the same example, but here is what I get on my computer as output
The very first row shows that col_1
and col_2
are different, while I expect them to be the same. Moreover, I expect col_1
and col_2
be sampled from set_diff
column. Could anybody help me to clarify my mistake?
The very first row shows that col_1 and col_2 are different, while I expect them to be the same.
set.seed(7)
makes sure that every time you run your script, it will create the same my_df
. It does not mean that every single time you run sample
, it will sample the same number, so col_1
and col_2
do not need to be the same. However, if you run your code twice, both will get you the same col_1
.
I expect col_1 and col_2 be sampled from set_diff column.
From the documentation of sample
: If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Therefore, if set_diff
equals 3, a sample is drawn from c(1,2,3)
.