Search code examples
rsortingmatrixrow

Fastest way to sort each row of a large matrix in R


I have a large matrix:

set.seed(1)
a <- matrix(runif(9e+07),ncol=300)

I want to sort each row in the matrix:

> system.time(sorted <- t(apply(a,1,sort)))
   user  system elapsed 
  42.48    3.40   45.88 

I have a lot of RAM to work with, but I would like a faster way to perform this operation.


Solution

  • Well, I'm not aware of that many ways to sort faster in R, and the problem is that you're only sorting 300 values, but many times. Still, you can seek some extra performance out of sort by directly calling sort.int and using method='quick':

    set.seed(1)
    a <- matrix(runif(9e+07),ncol=300)
    
    # Your original code
    system.time(sorted <- t(apply(a,1,sort))) # 31 secs
    
    # sort.int with method='quick'
    system.time(sorted2 <- t(apply(a,1,sort.int, method='quick'))) # 27 secs
    
    # using a for-loop is slightly faster than apply (and avoids transpose):
    system.time({sorted3 <- a; for(i in seq_len(nrow(a))) sorted3[i,] <- sort.int(a[i,], method='quick') }) # 26 secs
    

    But a better way should be to use the parallel package to sort parts of the matrix in parallel. However, the overhead of transferring data seems to be too big, and on my machine it starts swapping since I "only" have 8 GB memory:

    library(parallel)
    cl <- makeCluster(4)
    system.time(sorted4 <- t(parApply(cl,a,1,sort.int, method='quick'))) # Forever...
    stopCluster(cl)