I have a matrix with a couple million rows and about 40 columns.
I want to sort the elements in each row in decreasing order. Thus, the element with the highest value of each row should be in the first column.
To do this I can use the apply
function:
set.seed(1)
mm <- replicate(10, rnorm(20)) #random matrix with 20 rows and 10 columns
mm.sorted <- apply(mm,1,sort,decreasing=T)
But for a very large matrix this approach takes a very long time.
Are there different approaches to speed up the sorting of elements in rows?
You could use package data.table:
set.seed(1)
mm <- matrix(rnorm(1000000*40,0,10),ncol=40)
library(data.table)
system.time({
d <- as.data.table(mm)
d[, row := .I]
d <- melt(d, id.vars = "row") #wide to long format
setkey(d, row, value) #sort
d[, variable := paste0("V", ncol(mm):1)] #decreasing order
#back to wide format and coerce to matrix
msorted <- as.matrix(dcast(d, row ~ variable)[, row := NULL])
})
#user system elapsed
#4.96 0.59 5.62
If you could keep it as a long-format data.table (i.e., skip the last step), it would take about 2 seconds on my machine.
For comparison, timings of @qjgods' answer on my machine:
#user system elapsed
#3.71 2.08 8.81
Note that using apply
(or parallel versions of it) transposes the matrix.