Search code examples
rvectorizationouter-join

Efficient complex operations on every combination of columns between two matrices (R)


Given two matrices, I want to perform a complex operation between values in every combination of columns (specifically, I wish to calculate the sum of the square root of the product of the values in two given columns).

This sounded like a job for outer:

set.seed(123)
a <- matrix(abs(rnorm(12)),nrow=4)
a
           [,1]      [,2]      [,3]
[1,] 0.56047565 0.1292877 0.6868529
[2,] 0.23017749 1.7150650 0.4456620
[3,] 1.55870831 0.4609162 1.2240818
[4,] 0.07050839 1.2650612 0.3598138

b <- matrix(abs(rnorm(12)),nrow=4)
b
          [,1]      [,2]      [,3]
[1,] 0.4007715 0.4978505 1.0678237
[2,] 0.1106827 1.9666172 0.2179749
[3,] 0.5558411 0.7013559 1.0260044
[4,] 1.7869131 0.4727914 0.7288912

m <- outer (
  as.data.frame(a),
  as.data.frame(b),
  Vectorize(function (x, y)   sum(sqrt(x*y),na.rm=T))
)
m
         V1       V2       V3
V1 1.919315 2.429191 2.488925
V2 2.672994 3.432185 2.630920
V3 2.373466 2.859966 2.800881

... but this is proving to be prohibitively slow for the size of the matrices I'm working with. What's the fastest way to do this? I don't think this approach is appropriately taking advantage of vectorization.

P.S. the as.data.frame call is a further aspect of this approach I don't like, but I don't get the desired output without it. Instead, outer returns an array of:

dim(m)
[1] 4 3 4 3

Solution

  • sqrt each of a and b and then use crossprod giving the result shown which is the same as m. (Note that the question used random numbers without set.seed so m here is different than in the question.)

    crossprod(sqrt(a), sqrt(b))
    ##          [,1]     [,2]     [,3]
    ## [1,] 1.940283 1.957321 1.914920
    ## [2,] 2.740899 2.611588 2.564304
    ## [3,] 2.983868 2.957246 2.872090
    
    m
    ##          V1       V2       V3
    ## V1 1.940283 1.957321 1.914920
    ## V2 2.740899 2.611588 2.564304
    ## V3 2.983868 2.957246 2.872090