Search code examples
rcombn

Finding all unique combinations of 1:n numbers without packages


I need to create a function that provides me with all possible combinations of 1:n numbers. The argument of the function being n. I need to do this without using the combn function or any other pre-installed function within R.

enter image description here

This picture above depicts what I want to do. The bottom part is just using combn to check if the above function works.

I did the following but obviously it is not the right way currently.

pairwise_comp <- function(n) {

res <- matrix(nrow = 0, ncol = 2)
for (i in 1:n) {
  res <-rbind(res,cbind( i , i+1))
}


  return(res)

}

Solution

  • There are several ways to attack this, some efficient, some readable (subjective), not many are both.

    For instance, you can do it recursively, like so:

    pairwise_recur <- function(n, start = 1) {
      if (n == start) return()
      nrows <- factorial(n) / (factorial(2) * factorial(n-2))
      res <- matrix(nrow = nrows, ncol = 2)
      rbind(
        cbind(rep(start, times = n - start),
              1 + start:(n-1)),
        pairwise_recur(n, start = start + 1)
      )
    }
    pairwise_recur(4)
    #      [,1] [,2]
    # [1,]    1    2
    # [2,]    1    3
    # [3,]    1    4
    # [4,]    2    3
    # [5,]    2    4
    # [6,]    3    4
    

    But several things about this are less-efficient:

    1. R does not do tail-recursion very well, so theoretically this could fill the call stack and exhaust R; and
    2. This is doing what I suggested not to do in my comment about calling rbind iteratively.
    3. It is error-prone: if you call with n < start or n==0, then it will fail.

    And quite possibly:

    1. If you are not able to use factorial in this fashion, you can equivocate it with prod(1:n). The remaining functions below will use this prod method, over to you which is preferred.
    2. Both factorial and prod will start failing with really high n, likely well beyond the limit you are going to use for this assignment. At those numbers, it will likely be necessary to go into the gamma realm, more-efficient calculations for high-n factorials (and likely necessary until R is fully 64-bit-integer friendly).

    An iterative that fixes some of that might be

    pairwise_iter <- function(n) {
      nrows <- prod(1:n) / ( prod(1:2) * prod(1:(n-2)) )
      res <- matrix(nrow = nrows, ncol = 2)
      r <- 0
      for (i in 1:(n-1)) {
        for (j in (i+1):n) {
          r <- r + 1
          res[r,1] <- i
          res[r,2] <- j
        }
      }
      res
    }
    # same output
    

    And frankly, one can get rid of the r counter with some clever math on i and j.

    But it is still prone to problems when n < 3. This can be mitigated with:

    pairwise_iter2 <- function(n) {
      if (n <= 1) return(matrix(nrow = 0, ncol = 2))
      nrows <- prod(seq_len(n)) / ( prod(1:2) * prod(seq_len(n-2)) )
      res <- matrix(nrow = nrows, ncol = 2)
      r <- 0
      for (i in 1:(n-1)) {
        for (j in (i+1):n) {
          r <- r + 1
          res[r,1] <- i
          res[r,2] <- j
        }
      }
      res
    }
    
    pairwise_iter2(0)
    #      [,1] [,2]
    pairwise_iter2(1)
    #      [,1] [,2]
    pairwise_iter2(2)
    #      [,1] [,2]
    # [1,]    1    2
    pairwise_iter2(3)
    #      [,1] [,2]
    # [1,]    1    2
    # [2,]    1    3
    # [3,]    2    3
    

    One difference (which is pre-mitigated by the leading if/return) is the use of seq_len: if you want a sequence of length n, then 1:n is accurate only as long as n >= 1. If n is 0, then 1:0 produces a vector of length 2, which is not what you should get; instead seq_len(0) returns a vector of length 0, which is more consistent.


    This is still not "efficient" in the R way of doing things. For that, you can remove the inner for loop and assign by vectors:

    pairwise_vec1 <- function(n) {
      if (n <= 1) return(matrix(nrow = 0, ncol = 2))
      nrows <- prod(seq_len(n)) / ( prod(1:2) * prod(seq_len(n-2)) )
      res <- matrix(nrow = nrows, ncol = 2)
      r <- 0
      for (i in 1:(n-1)) {
        vec <- seq_len(n - i)
        res[r + vec, 1] <- i
        res[r + vec, 2] <- i + vec
        r <- r + length(vec)
      }
      res
    }
    

    It is actually possible to generate this without even the outer for loop, but it requires a bit more vectorized wizardry that is both outside the scope of this assignment and outside of my time to dedicate to this lesson.