I have a dataset with two different columns (X and Y) that both contains the exact same amount of 0s and 1s:
0 1
3790 654
Now I want to have column Y to contain an exact amount of 1733 1s and 2711 0s. But the 1079 extra 1s (1733-654) must be assigned randomly. I already tried the following:
ind <- which(df$X == 0)
ind <- ind[rbinom(length(ind), 1, prob = 1079/3790) > 0]
df$Y[ind] <- 1
But if I run this code, there is everytime a different number of 1s, and I want it to be exactly 1733 if I run it. How do I do this?
You have this vector:
x <- sample(c(rep(0, 3790), rep(1, 654)))
#> table(x)
#> x
#> 0 1
#> 3790 654
What you need to do is randomly select the position of 1079 elements in your vector that equals 0, and assign them the value 1:
s <- sample(which(x == 0), 1079)
x[s] <- 1
#> table(x)
#> x
#> 0 1
#> 2711 1733