Search code examples
ralgorithmperformancematrixigraph

Efficiently enumerate all possible matrices with given constraints


Background

Assuming we have a family of matrices Ms of size n-by-n, which should meet the following requirements:

  1. Entries of the matrix are either 0 or 1, i.e., boolean, but diagonal entries are always 0s
  2. The matrix is symmetric, i.e., M == t(M)
  3. There is a constant row (or equivalently, column) sum constraint p, such that all(rowSums(M)==p) == TRUE

Questions

  • Is there any potential features from the particular matrix structure, e.g., symmetry, boolean, or somethings else, such that we can take benefits from that to improve the efficiency of searching?
  • It seems that the problem can be interpreted as a graph theory way. For example, the matrix is an adjacent matrix of a graph that consists of n vertices with both indegrees and outdegrees being equal to p. This can be done by sample_degseq, but we may have to find all of its isomorphic mappings. How can we do that if we use igraph approaches?

My code so far looks like below, but it is slow when we have larger n or p (and also I am not sure if some of matrices are missed during the enumeration).

f <- function(n, p) {
    # helper function to check if requirement holds
    checker <- function(M, p, i = nrow(M) - 1) {
        rs <- rowSums(M)
        if ((i == nrow(M) - 1)) {
            return(all(rs == p))
        } else {
            return(all(rs[1:i] == p) && all(rs[-(1:i)] <= p))
        }
    }

    # start from all-0's matrix
    lst <- list(matrix(0, n, n))
    for (i in 1:(n - 1)) {
        js <- (i + 1):n
        r <- list()
        for (mat in lst) {
            k <- p - sum(mat[i, ])
            if (k == 0) {
                if (checker(mat, p, i)) {
                    r <- c(r, list(mat))
                }
            }
            if (k > 0 && length(js) >= k) {
                idx <- combn(length(js), k, \(v) js[v], simplify = FALSE)
                for (u in idx) {
                    mm <- mat
                    mm[i, u] <- 1
                    mm[u, i] <- 1
                    if (checker(mm, p, i)) {
                        r <- c(r, list(mm))
                    }
                }
            }
        }
        lst <- r
    }
    lst
}

Examples

  • For n <- 4 and p <- 2, we can find 3 matrices
[[1]]
     [,1] [,2] [,3] [,4]
[1,]    0    1    1    0
[2,]    1    0    0    1
[3,]    1    0    0    1
[4,]    0    1    1    0

[[2]]
     [,1] [,2] [,3] [,4]
[1,]    0    1    0    1
[2,]    1    0    1    0
[3,]    0    1    0    1
[4,]    1    0    1    0

[[3]]
     [,1] [,2] [,3] [,4]
[1,]    0    0    1    1
[2,]    0    0    1    1
[3,]    1    1    0    0
[4,]    1    1    0    0
  • For n <- 3 and p <- 2, we can find only 1 matrix
[[1]]
     [,1] [,2] [,3]
[1,]    0    1    1
[2,]    1    0    1
[3,]    1    1    0

Solution

  • The question boils down to generating all realizations of a degree sequence as a simple undirected graphs. igraph presently contains the following related tools:

    • realize_degseq() creates a single realization (we want all).
    • sample_degseq() does a random sampling of realizations, and some of the methods it implements ensure uniform sampling.
    • is_graphical() tests if any realization exists at all. If it does, we call the degree sequence graphical. This will be the basic building block for our algorithm below.

    Suppose our degree sequence is {2, 3, 1, 2}. We can draw this as follows:

    enter image description here

    To generate all realizations, we need to connect up the stubs in all possible ways. Some of these ways will lead to multi-edges or self-loops, i.e. a non-simple graphs. We need to discard these.

    The trick to efficient generation of all simple realizations is to detect early if we'd be forced to create a non-simple graph and stop. We can do this as follows. Let us take the first vertex, which has d_1 = 2 stubs. There are n-1 = 3 other vertices we can connect these two stubs to. There are (n-1) \choose d_1 = 3 \choose 2 = 3 ways to do so, namely the following:

    enter image description here

    enter image description here

    enter image description here

    Can we reach a simple graph from all three of these configurations? Notice that after connecting all stubs of vertex 1, what remains is a degree sequence of vertices 2, 3, 4, which may or may not be graphical. We call this the reduced degree sequence. For these three cases, we have:

    2 0 2
    2 1 1
    3 0 1
    

    (Discarding the first vertex whose degree now became 0, and no longer needs to be considered.)

    Only the second of these is graphical, so we only need to continue the construction along this path.

    > is_graphical(c(2,0,2))
    [1] FALSE
    > is_graphical(c(2,1,1))
    [1] TRUE
    > is_graphical(c(3,0,1))
    [1] FALSE
    

    Taking then the second remaining degree sequence, we can recursively continue the construction in the same manner, obtaining the single possible realization for our starting degree sequence:

    enter image description here

    Expressing this recursive constructions in pseudocode:

    # degrees is a vector of degrees
    # vertices is a vector of corresponding vertex names
    # generate(degrees, vertices) returns all realizations 
    # of the given degrees as simple graphs
    generate(degrees, vertices):
        RESULT = {} # empty list
        d1 = degrees[1]  # first degree
        v1 = vertices[1] # first vertex
        drest = degrees[2:]  # rest of degrees
        vrest = vertices[2:] # rest of vertices
        for each size d1 subset S of vrest:
            dreduced = drest
            dreduced[S] -= 1 # decrement by one elements in S to obtain the reduced degree sequence
            if is_graphical(dreduced):
                for each graph in generate(dreduced, vrest),
                add vertex v1 to it and connect it to each of S,
                then append the graph to RESULT
        return RESULT
    

    I have an implementation of this in Mathematica, but I am not fluent enough in R to be able to take the time to implement it in that language.


    I should note that there are more efficient ways to do the enumeration, but they are a bit more complicated. Here we take the first vertex, and connect all its stubs in all possible ways to the rest of vertices. Then out of these possibilities, we only consider the feasible ones, i.e. those that led to a graphical reduced degree sequence. It is possible to take just a single stub of the first vertex, connect it, and already decide if it is feasible to continue. This can be done using the star-constrained graphicality theorem from this paper.