Fast parallel bipartite distance calculation in R

What is the fastest calculation for bipartite distance in R with a parallelized Rcpp backend?

parallelDist is a great package with a cpp backend and support for multi-threading, but does not support bipartite distance calculations (to my knowledge).

Using parallelDist() for bipartite distance matrix computation. This involves calculating m1:m1 and m2:m2 in addition to m1:m2 -- highly inefficient.

library(parallelDist)

bipartiteDist <- function(matrix1,matrix2){
  matrix12 <- rbind(matrix1,matrix2)
  d <- parallelDist(matrix12)
  d <- as.matrix(d)[(1:nrow(matrix1)),((nrow(matrix1)+1):(nrow(matrix1)*2))]
  d
}

matrix1 <- abs(matrix(rnorm(1000),10,100000))
matrix2 <- abs(matrix(rnorm(1000),10,100000))

dist <- bipartiteDist(matrix1, matrix2)

This approach is faster than pDist or a pure R implementation when more than 3 cores are available.

pdist is perfect for computing bipartite distances, but does not support multithreading.

Any fast implementations for parallelized bipartite distance computation?

Solution

The wordspace dist.matrix() function supports parallelized calculation of bipartite distances.

Benchmarking wordspace against parallelDist

matrix1 <- abs(matrix(rnorm(1000),100,100000))
matrix2 <- abs(matrix(rnorm(1000),100,100000))

library(rbenchmark)
library(parallelDist)
library(wordspace)

bipartiteDist_parallelDist <- function(matrix1,matrix2){
  matrix12 <- rbind(matrix1,matrix2)
  d <- parallelDist(matrix12, method = "euclidean")
  d <- as.matrix(d)[(1:nrow(matrix1)),((nrow(matrix1)+1):(nrow(matrix1)*2))]
  d
}

bipartiteDist_wordspace <- function(matrix1,matrix2){
  wordspace.openmp(threads = wordspace.openmp()$max)
  dist.matrix(matrix1,matrix2, byrow = TRUE, method = "euclidean", convert = FALSE)
}

benchmark("parallelDist" = {
            bd1 <- bipartiteDist_parallelDist(matrix1,matrix2)
          },
          "wordspace" = {
            bd2 <- bipartiteDist_wordspace(matrix1,matrix2)
          },
          replications = 1,
          columns = c("test", "replications", "elapsed",
                      "relative", "user.self", "sys.self"))

plot(bd1,bd2) # yes, both methods give near-identical results

Benchmarking results:

          test replications elapsed relative user.self sys.self
1 parallelDist            1   2.120   12.184   126.145    0.523
2    wordspace            1   0.174    1.000     3.749    0.252

I used 80 threads.

A framework for further speed gains

The author of wordspace admits to emphasizing low memory load over speed, thus additional speed gains are possible (source).

For instance, here is a general framework for Euclidean distance:

bipartiteDist3 <- function(matrix1,matrix2){
  m1tm2 <- tcrossprod(matrix1,matrix2)
  sq1 <- rowSums(matrix1^2)
  sq2 <- rowSums(matrix2^2)
  out0 <- outer(sq1, sq2, "+") - 2 * m1tm2
  sqrt(out0)
}

I am very interested in a parallelized solution optimized for sparse matrices. To my knowledge, wordspace does not optimize for sparsity. For example, there are parallelizable sparse matrix implementations of tcrossprod, rowSums, and outer function equivalents.