r parallel-processing assignment-problem

Linear sum assignment/Hungarian method performance in R

I need to speed up a process of finding most optimal distance for each entry. I am using gower.dist from StatMatch and solve_LSAP from the clue package. The gower distance takes no time at all, however the LSAP solver takes too long with the number of times I need to run it.

Is there a way to make this run faster using parallel computing or just making part of it run in parallel [link to clue github] [link to scientific journal discussing this] or another solver that I may be unaware of that is faster? The other two libraries I am aware of are adagio and RcppHungarian (both are slower).

Example data: Gower Distance Data (google drive link to folder with data)

> dim(gowerdist)
[1]  4309 10366

solve_LSAP(gowerdist, maximum = FALSE)

Solution

clue::solve_LSAP uses the Hungarian algorithm. The Jonker and Volgenant (1987) algorithm, implemented in TreeDist::LAPJV(), is more efficient.

I tested a reduced version of the matrix to get results in seconds rather than minutes; differences in run time are likely to increase with larger matrices.

gower <- read.csv("GowerDistance.csv")
dim(gower)
gowerMat <- as.matrix(gower)
gow1 <- gowerMat[1:2400, 1:4800]

tictoc::tic()
clue <- clue::solve_LSAP(gow1, maximum = FALSE)
tictoc::toc()
# 67.95 sec elapsed

tictoc::tic()
td <- TreeDist::LAPJV(gow1)
tictoc::toc()
# 20.24 sec elapsed

GraphAlignment::LinearAssignment() on Bioconductor also uses the LAPJV algorithm, but can only be applied to square matrices. (A workaround is to add extra rows to the matrix with extremely high values.)

Another alternative is lpSolve; this does not specify which algorithm it uses, but is much slower

tictoc::tic()
lps <- lpSolve::lp.assign(gowerMat[1:800, 1:1600]) # much smaller matrix
tictoc::toc()
# 364.67 sec elapsed