Search code examples
rperformancefor-loopmemory-efficient

Faster computation of double for loop?


I have a piece of working code that is taking too many hours (days?) to compute. I have a sparse matrix of 1s and 0s, I need to subtract each row from any other row, in all possible combinations, multiply the resulting vector by another vector, and finally average the values in it so to get a single scalar which I need to insert in a matrix. What I have is:

m <- matrix( 
c(0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0), nrow=4,ncol=4,
byrow = TRUE)   

b <- c(1,2,3,4)

for (j in 1:dim(m)[1]){
 for (i in 1:dim(m)[1]){
    a <- m[j,] - m[i,]
    a[i] <- 0L
    a[a < 0] <- 0L
    c <- a*b
    d[i,j] <- mean(c[c > 0])
 }
}

The desired output is matrix with the same dimensions of m, where each entry is the result of these operations. This loop works, but are there any ideas on how to make this more efficient? Thank you


Solution

  • My stupid solution is to use apply or sapply function, instead of for loop to do the iterations:

    sapply(1:dim(m)[1], function(k) {z <- t(apply(m, 1, function(x) m[k,]-x)); diag(z) <- 0; z[z<0] <- 0; apply(t(apply(z, 1, function(x) x*b)),1,function(x) mean(x[x>0]))})
    

    I tried to compare your solution and this in terms of running time in my computer, yours takes

    t1 <- Sys.time()
    d1 <- m
    for (j in 1:dim(m)[1]){
      for (i in 1:dim(m)[1]){
        a <- m[j,] - m[i,]
        a[i] <- 0L
        a[a < 0] <- 0L
        c <- a*b
        d1[i,j] <- mean(c[c > 0])
      }
    }
    Sys.time()-t1
    

    Yours needs Time difference of 0.02799988 secs. For mine, it is reduced a bit but not too much, i.e., Time difference of 0.01899815 secs, when you run

    t2 <- Sys.time()
    d2 <- sapply(1:dim(m)[1], function(k) {z <- t(apply(m, 1, function(x) m[k,]-x)); diag(z) <- 0; z[z<0] <- 0; apply(t(apply(z, 1, function(x) x*b)),1,function(x) mean(x[x>0]))})
    Sys.time()-t2
    

    You can try it on your own computer with larger matrix, good luck!