Search code examples
rdata.tablecombinationscombn

Faster version of combn


Is there a way to speed up the combn command to get all unique combinations of 2 elements taken from a vector?

Usually this would be set up like this:

# Get latest version of data.table
library(devtools)
install_github("Rdatatable/data.table",  build_vignettes = FALSE)  
library(data.table)

# Toy data
d <- data.table(id=as.character(paste0("A", 10001:15000))) 

# Transform data 
system.time({
d.1 <- as.data.table(t(combn(d$id, 2)))
})

However, combn is 10 times slower (23sec versus 3 sec on my computer) than calculating all possible combinations using data.table.

system.time({
d.2 <- d[, list(neighbor=d$id[-which(d$id==id)]), by=c("id")]
})

Dealing with very large vectors, I am searching for a way to save memory by only calculating the unique combinations (like combn), but with the speed of data.table (see second code snippet).

I appreciate any help.


Solution

  • You could use combnPrim from gRbase

    source("http://bioconductor.org/biocLite.R")
    biocLite("gRbase") # will install dependent packages automatically.
    system.time({
     d.1 <- as.data.table(t(combn(d$id, 2)))
     })
    #   user  system elapsed 
    # 27.322   0.585  27.674 
    
    system.time({
    d.2 <- as.data.table(t(combnPrim(d$id,2)))
     })
    #   user  system elapsed 
    #  2.317   0.110   2.425 
    
    identical(d.1[order(V1, V2),], d.2[order(V1,V2),])
    #[1] TRUE