Search code examples
rpermutationcombinatorics

How to efficiently generate unique permutation with R


I have the following code that generate the unique permutation:

library(magrittr)
library(tictoc)

count_unique_perm <- function(l = NULL) {
  lo <- combinat::permn(l)
  do.call(rbind, lapply(lo, paste0, collapse = ""))[, 1] %>%
    unique() %>%
    length()
}

It already give the correct result. With this input:

l1 <- c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P") # 720
l2 <- c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S") # 30,240

But it's running extremely slow.

tic()
count_unique_perm(l = l1)
toc()
#118.155 sec elapsed

#107.793 sec elapsed for l2

How can I speed it up?


Solution

  • Try the RcppAlgos package, which will return permutations of multisets by using the freqs argument.

    library(RcppAlgos)
    library(microbenchmark)
    
    # get a matrix of unique permutations
    x <- table(c("R", "R", "R", "R", "R", "R", "R", "E", "K", "P"))
    y <- table(c("R", "R", "R", "R", "R", "Q", "G", "K", "M", "S"))
    
    microbenchmark(permx = permuteGeneral(names(x), freqs = x),
                   permy = permuteGeneral(names(y), freqs = y))
    #> Unit: microseconds
    #>   expr    min     lq     mean  median      uq    max neval
    #>  permx   32.3   38.0   44.018   42.05   47.95   64.8   100
    #>  permy 1538.8 1567.7 1751.259 1606.60 1649.35 5082.5   100
    dim(permuteGeneral(names(x), freqs = x))
    #> [1] 720  10
    dim(permuteGeneral(names(y), freqs = y))
    #> [1] 30240    10
    

    To get just the number of unique permutations, use permuteCount.

    microbenchmark(permx = permuteCount(names(x), freqs = x),
                   permy = permuteCount(names(y), freqs = y))
    #> Unit: microseconds
    #>   expr min  lq  mean median  uq  max neval
    #>  permx 1.5 1.6 1.791    1.6 1.8  6.6   100
    #>  permy 1.5 1.6 2.260    1.7 1.8 46.2   100
    permuteCount(names(x), freqs = x)
    #> [1] 720
    permuteCount(names(y), freqs = y)
    #> [1] 30240