Search code examples
rperformancesortingmatrixapply

The fastest way to sort the elements in each row of a matrix?


I have a matrix with a couple million rows and about 40 columns.

I want to sort the elements in each row in decreasing order. Thus, the element with the highest value of each row should be in the first column.

To do this I can use the apply function:

set.seed(1)
mm <- replicate(10, rnorm(20)) #random matrix with 20 rows and 10 columns
mm.sorted <- apply(mm,1,sort,decreasing=T)

But for a very large matrix this approach takes a very long time.

Are there different approaches to speed up the sorting of elements in rows?


Solution

  • You could use package data.table:

    set.seed(1)
    mm <- matrix(rnorm(1000000*40,0,10),ncol=40) 
    library(data.table)
    system.time({
      d <- as.data.table(mm)
      d[, row := .I]
      d <- melt(d, id.vars = "row") #wide to long format
      setkey(d, row, value) #sort
      d[, variable := paste0("V", ncol(mm):1)] #decreasing order
    
      #back to wide format and coerce to matrix
      msorted <- as.matrix(dcast(d, row ~ variable)[, row := NULL]) 
    })
    #user  system elapsed 
    #4.96    0.59    5.62 
    

    If you could keep it as a long-format data.table (i.e., skip the last step), it would take about 2 seconds on my machine.

    For comparison, timings of @qjgods' answer on my machine:

    #user  system elapsed 
    #3.71    2.08    8.81
    

    Note that using apply (or parallel versions of it) transposes the matrix.