Search code examples
rarraysforeachdoparallelnested-for-loop

How do you get an array of matrices back from a nested (parallel) foreach loop?


With a list of unique objects, where the identities matter (and thus order... but only for the purpose of tracking identity):

fakeDataList <- list(one = 1,
                     two = 2,
                     three = 3,
                     four = 4)

There's a function that performs pairwise calculations...

fakeInnerFxn <- function(l){
  x <- l[[1]]
  y <- l[[2]]
  low <- x + y - 1
  mid <- x + y
  high <- x + y + 1
  out <- c(low, mid, high)
  return(out)
}

... and returns three values per pair of ids

fakeInnerFxn(fakeDataList[c(1,2)])
#> [1] 2 3 4

The inner function is nested within the outer function which performs each pairwise operation on the full list...

fakeOuterFxn <- function(d){
  n <- length(d)
  out <- array(0, dim = c(n,n,3))
  colnames(out) <- names(d)
  rownames(out) <- names(d)
  for(i in 1:n){
    for(j in (i+1):n){
      if (j <= n) {
        out[i, j, ] <- fakeInnerFxn(d[c(i, j)])
      }
    }
  }
 diag(out[,,1]) <- 0  # not sure how to do this succinctly 
 diag(out[,,2]) <- 0   
 diag(out[,,3]) <- 0   
 
 return(out)
}

... and returns an array of three matrices representing the 'low', 'mid' and 'high'

fakeOuterFxn(fakeDataList)
#> , , 1
#> 
#>       one two three four
#> one     0   2     3    4
#> two     0   0     4    5
#> three   0   0     0    6
#> four    0   0     0    0
#> 
#> , , 2
#> 
#>       one two three four
#> one     0   3     4    5
#> two     0   0     5    6
#> three   0   0     0    7
#> four    0   0     0    0
#> 
#> , , 3
#> 
#>       one two three four
#> one     0   4     5    6
#> two     0   0     6    7
#> three   0   0     0    8
#> four    0   0     0    0

The actual data is a very long list and the calculations are slow.

How can I parallelize this code with foreach and doParallel in such a way that the array is preserved and the row/column orders are preserved (or at least able to be kept track of and re-ordered at the end)?

library(foreach)
library(doParallel)
#> Loading required package: iterators
#> Loading required package: parallel

registerDoParallel(detectCores()-2)  

The for loop doesn't need to be inside a function, but it'd be neat if it was.

d <- fakeDataList
n <- length(d)

This is really as far as I've been able to get with it:

out <- foreach(i=1:n, .combine = 'c') %:%
  foreach(j=(i+1):n, .combine = 'c') %dopar% {
    if (j <= n) {
      fakeInnerFxn(d[c(i, j)])
    }
  }

The answers are all here, but how do i get an array back?

out
#>  [1] 2 3 4 3 4 5 4 5 6 4 5 6 5 6 7 6 7 8 7 8 9

Created on 2021-06-22 by the reprex package (v1.0.0)


Solution

  • You can always return the indices with the results and reconstruct your array later.

    res <- foreach(i=1:(n-1), .combine = 'c') %:%
      foreach(j=(i+1):n) %dopar% {
        list(i, j, fakeInnerFxn(d[c(i, j)]))
      }
    
    n <- length(d)
    out <- array(0, dim = c(n, n, 3))
    for (res_k in res) out[res_k[[1]], res_k[[2]], ] <- res_k[[3]]