Search code examples
rmatrixconditional-statementssparse-matrixcombinatorics

Drawing conditional combinations of a binary vector one by one


I am trying to write a routine to find combinations conditionally of a binary vector. For example, consider the following vector:

> A <- rep(c(1,0,0),3)
> A
[1] 1 0 0 1 0 0 1 0 0

Note that, length of the vector A is always multiple of 3. So the following condition always holds:

length(A) %% 3 == 0

The main condition is that there must be only a single 1 in each set of 3 vectors consecutively. In this example, for instance, one element of A[1:3] will be 1, one element of A[4:6] will be 1 and one element of A[7:9] will be 1 and the rest are all 0. Therefore, for this example, there will be a total of 27 possible combinations.

Objective is to make a routine to draw/return the next valid combination until all the possible legal combinations are returned.

Note that, I am not looking for a table with all the possible combinations. That Solution is already available in my other query in StackOverflow. However, with that method, I am running into memory problems when going beyond more than a length of 45 elements in A, as it is returning the full matrix which is huge. Therefore instead of storing the full matrix, I want to retrieve one combination at a time, and then decide later if I want to store it or not.


Solution

  • What the OP is after is an iterator. If we were to do this properly, we would write a class in C++ with a get_next method, and expose this to R. As it stands, with base R, since everything is passed by value, we must call a function on our object-to-be-updated and reassign the object-to-be-updated every time.

    Here is a very crude implementation:

    get_next <- function(comb, v, m) {
        s <- seq(1L, length(comb), length(v))
        e <- seq(length(v), length(comb), length(v))
        
        last_comb <- rev(v)
        can_be_incr <- sapply(seq_len(m), function(x) {
            !identical(comb[s[x]:e[x]], last_comb)
        })
    
        if (all(!can_be_incr)) {
            return(FALSE)
        } else {
            idx  <- which(can_be_incr)[1L]
            span <- s[idx]:e[idx]
            j <- which(comb[span] == 1L)
            comb[span[j]] <- 0L
            comb[span[j + 1L]] <- 1L
            
            if (idx > 1L) {
                ## Reset previous maxed out sections
                for (i in 1:(idx - 1L)) {
                    comb[s[i]:e[i]] <- v
                }
            }
        }
        
        return(comb)
    }
    

    And here is a simple usage:

    m <- 3L
    v <- as.integer(c(1,0,0))
    comb <- rep(v, m)
    count <- 1L
    
    while (!is.logical(comb)) {
        cat(count, ": ", comb, "\n")
        comb <- get_next(comb, v, m)
        count <- count + 1L
    }
    
    1 :  1 0 0 1 0 0 1 0 0 
    2 :  0 1 0 1 0 0 1 0 0 
    3 :  0 0 1 1 0 0 1 0 0 
    4 :  1 0 0 0 1 0 1 0 0 
    5 :  0 1 0 0 1 0 1 0 0 
    6 :  0 0 1 0 1 0 1 0 0 
    7 :  1 0 0 0 0 1 1 0 0 
    8 :  0 1 0 0 0 1 1 0 0 
    9 :  0 0 1 0 0 1 1 0 0 
    10 :  1 0 0 1 0 0 0 1 0 
    11 :  0 1 0 1 0 0 0 1 0 
    12 :  0 0 1 1 0 0 0 1 0 
    13 :  1 0 0 0 1 0 0 1 0 
    14 :  0 1 0 0 1 0 0 1 0 
    15 :  0 0 1 0 1 0 0 1 0 
    16 :  1 0 0 0 0 1 0 1 0 
    17 :  0 1 0 0 0 1 0 1 0 
    18 :  0 0 1 0 0 1 0 1 0 
    19 :  1 0 0 1 0 0 0 0 1 
    20 :  0 1 0 1 0 0 0 0 1 
    21 :  0 0 1 1 0 0 0 0 1 
    22 :  1 0 0 0 1 0 0 0 1 
    23 :  0 1 0 0 1 0 0 0 1 
    24 :  0 0 1 0 1 0 0 0 1 
    25 :  1 0 0 0 0 1 0 0 1 
    26 :  0 1 0 0 0 1 0 0 1 
    27 :  0 0 1 0 0 1 0 0 1
    

    Note, this implementation will be memory efficient, however it will be very slow.