Search code examples
rmatrixsampling

Sampling within a 0/1 matrix


I have a $N$x$N$ matrix $G$ composed of zeros and ones. I would like to resample only the ones in $G$ from a Bernoulli distribution where probability $p$ varies along the raws. The idea is to replace all ones in $G$ with realizations from independent Bernoulli variables. I have tried this in R :

for(i in 1:N){ 
for(j in 1:N){ 
G[i,j] <- ifelse(G[i,j] == 1,rbinom(1,1,p[i]),0)}} 

However, for computation time, it would be better to avoid the use of for loops. Would anyone know how to write this in matrix form ? Thank you very much!


Solution

  • Exploit R's recycling of vectors and its column-oriented storage. I'll explain after showing one solution.

    #
    # Generate a 0/1 matrix `G` for testing.
    #
    N <- 1e3
    G <- matrix(rbinom(N^2, 1, 1/2), N)
    #
    # Generate the vector of probabilities for testing.
    #
    p <- runif(N)
    #
    # Here is a solution.
    #
    U <- runif(N*N) <= p # Random 0/1 values
    G.0 <- G * U         # Conditionally replace elements of G
    

    The runif function generates one uniform variate for each entry in G. Comparing it to the vector p creates N vectors of 0/1 values (1 for true, 0 for false), all strung together, with the expected frequency of ones in locations i, i+N, i+2N, ..., i+(N-1)N given by the value of p[i] for i=1, 2, ..., N. When this is multiplied by G in the last line, the new values are zeros when either the old value was zero or the new one is, which is the desired result because the values of G internally are also laid out column by column.

    This solution is about 20 times faster than the original double for loop. It avoids the slowness of conditional execution (ifelse) and exploits any internal parallelism and optimization of R's basic operations (runif, <=, and *). You do pay a (relatively small) price: RAM to store these random results (here explicitly placed into array U) is needed.