Multiplication of matrices along rows to get an array

I have a 2x4 matrix A and a 3x4 matrix B. I would like to get an array C of dimensions (2, 3, 4) where the ijk entry of C is the product of the ik entry of A and the jk entry of B.

Is there a fast way to do this in R by avoiding a loop? Example below with two ways to calculate what I am looking for -- both of which involve for loops

A <- matrix(1:8, 2, 4)
B <- matrix(11:22, 3, 4)

C1 <- array(NA, dim=c(2, 3, 4))
for (ii in 1:2) {
  for (jj in 1:3) {
    C1[ii, jj, ] <- A[ii, ] * B[jj, ]
  }
}

C2 <- array(NA, dim=c(2, 3, 4))
for (ss in 1:4) {
  C2[, , ss] <- outer(A[, ss], B[, ss])
}

Solution

Not much of an improvement, but using abind:

C3 <- do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))

Benchmark

NOTE: the previous time I ran this benchmark was on a different laptop running R-4.0.5, and in that case, running the benchmark several times, C1's performance as on par with C2's. I switched laptops (for other reasons), saw @jay.sf's answer and wanted to add it to the fray, and now C1 is significantly faster. I cannot explain the different, but the relevant specs:

previous benchmark: windows 10, R-4.0.5
this benchmark: windows 11, R-4.1.2

It seems to me that the "turbulence" in the benchmarks is due to the small data size, influenced heavily by administrative overhead. If the matrices are much larger, I would expect better breakout (without verification).

bench::mark(
  C1 = {
  for (ii in 1 : 2) {
    for (jj in 1 : 3) {
      C1[ii, jj, ] <- A[ii, ] * B[jj, ]
    }
  }
  as.numeric(C1)
},
  C2 = {
  for (ss in 1 : 4) {
    C2[, , ss] <- outer(A[, ss], B[, ss])
  }
  as.numeric(C2)
},
  C3 = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(A)), function(ss) outer(A[,ss], B[,ss])), along=3))),
  C4 = as.numeric(vapply(1:4, \(ss) outer(A[, ss], B[, ss]), matrix(0, 2, 3)))
)
# # A tibble: 4 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result     memory             time                gc                   
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>     <list>             <list>              <list>               
# 1 C1           16.6us   52.3us    18669.      240B     0     9317     0      499ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [9,317]>  <tibble [9,317 x 3]> 
# 2 C2           25.8us   85.6us    11225.      240B     2.14  5235     1      466ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [5,236]>  <tibble [5,236 x 3]> 
# 3 C3            763us  797.4us     1144.      720B     0      572     0      500ms <dbl [24]> <Rprofmem [3 x 3]> <bench_tm [572]>    <tibble [572 x 3]>   
# 4 C4           24.8us   36.7us    21752.      240B     2.18  9999     1      460ms <dbl [24]> <Rprofmem [1 x 3]> <bench_tm [10,000]> <tibble [10,000 x 3]>

(I added as.numeric to each of them because some return ints, some doubles, and I didn't want to as.numeric one and bias the benchmark. It's not strictly required for any of them, but now we can be assured that they are all equal, since otherwise bench::mark would fail, complaining that the output differs.)

I like @jay.sf's answer because it is both fast and does not require additional packages (abind is non-standard but handy to have nonetheless).

Benchmark, take 2

Let's grow the data a little.

A2 <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Abig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, A, simplify = FALSE)), simplify = FALSE))
Bbig <- do.call(cbind, replicate(50, do.call(rbind, replicate(50, B, simplify = FALSE)), simplify = FALSE))
C1big <- C2big <- array(dim=c(dim(Abig)[1], dim(Bbig)))
dim(Abig)
# [1] 100 200

Now a different benchmark, now on windows 11, R-4.1.2:

bench::mark(
  C1big = {
  for (ii in seq(nrow(Abig))) {
    for (jj in seq(nrow(Bbig))) {
      C1big[ii, jj, ] <- Abig[ii, ] * Bbig[jj, ]
    }
  }
  as.numeric(C1big)
},
  C2big = {
  for (ss in seq(ncol(Abig))) {
    C2big[, , ss] <- outer(Abig[, ss], Bbig[, ss])
  }
  as.numeric(C2big)
},
  C3big = as.numeric(do.call(abind::abind, c(lapply(seq(ncol(Abig)), function(ss) outer(Abig[,ss], Bbig[,ss])), along=3))),
  C4big = as.numeric(vapply(seq(ncol(Abig)), function(ss) outer(Abig[, ss], Bbig[, ss]), matrix(0, nrow(Abig), nrow(Bbig)))),
  iterations = 30
)
# # A tibble: 4 x 13
#   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result            memory                  time            gc               
#   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>            <list>                  <list>          <list>           
# 1 C1big       100.7ms    135ms      7.38    83.5MB     2.68    22     8      2.98s <dbl [3,000,000]> <Rprofmem [75,001 x 3]> <bench_tm [30]> <tibble [30 x 3]>
# 2 C2big          22ms   24.4ms     36.0     46.8MB     7.20    25     5    694.7ms <dbl [3,000,000]> <Rprofmem [1,801 x 3]>  <bench_tm [30]> <tibble [30 x 3]>
# 3 C3big        26.5ms   28.4ms     32.8     92.6MB    28.7     16    14   488.54ms <dbl [3,000,000]> <Rprofmem [1,632 x 3]>  <bench_tm [30]> <tibble [30 x 3]>
# 4 C4big          10ms   17.4ms     61.2     46.7MB    18.6     23     7    375.9ms <dbl [3,000,000]> <Rprofmem [1,402 x 3]>  <bench_tm [30]> <tibble [30 x 3]>

It seems like we're getting to where I would expect: jay.sf's vapply implementation should do very well, and it appears to be breaking out with `itr/sec` and mem_alloc being good indicators.