I have many non-square matrices like this example:
1 1 0
1 1 0
1 1 0
1 0 1
I'd like a general solution that finds the largest densely connected region in these matrices. So for my example the solution would return rows=c(1, 2, 3), columns=c(1,2)
. That said, I'm okay with non-optimal solutions, i.e. local minima are fine.
I think this is similar to the max-clique problem. However, my matrices aren't square and they don't represent graphs, so I'm having trouble using network tools for cliques like igraph::cliques()
. How do I find densely connected regions of non-square matrices?
For clarification by "dense region" I mean any rectangular block in the matrix that contains all 1s, and this can be arrived at by reordering rows and columns. So ordering of rows and columns in the original matrix doesn't matter and I want to consider all permutations of orderings. I'm really looking for regions that are analogous/equivalent to cliques in an adjacency matrix, but, again, these matrices aren't square.
You are looking for maximal bicliques. igraph does not support these directly yet, but feel free to open a feature request on GitHub here. If you do, it'd be nice if you could look up and reference some algorithms to compute it.
For now, we can reduce this to a standard clique problem by interpreting this non-square matrix as a non-diagonal piece of a symmetric square matrix. This is your matrix:
We transform it to this:
White represents zero, any other colour represents one. I used two non-white colours to make it clear where the original matrix appears.
The original rows 1-4 are still 1-4, the original columns 1-3 are now 5-7.
We can now look for maximal cliques which contain both "row vertices" (1-4) and "column vertices" (5-7).
To implement this with igraph, you can:
m
, take m <- 1 - m
.graph_from_incidence_matrix(m)
to get a graph corresponding to the complement.maximal_ivs()
. But as I recall, this still uses an inefficient implementation in igraph. So I recommend taking the complementer()
graph and looking for max_cliques()
instead.This what we get:
1, 2, 3, 4, 5
, which is rows 1-4 and column 11, 2, 3, 5, 6
, which is rows 1-3 and columns 1-2 (your example solution)4, 5, 7
, which is row 4 and columns 1, 35, 6, 7
, which we throw away because it contains only columns, and no rows.I'll leave the detailed solution to you as I'm not much of an R programmer. I did this using IGraph/M in Mathematica.
First steps of an R solution, still needs filtering of results:
library(igraph)
m <- matrix(c(1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1), nrow = 4, ncol = 3, byrow = TRUE)
g <- graph_from_incidence_matrix(1 - m)
# one way:
maximal_ivs(g)
# another way, likely faster:
max_cliques(complementer(g))
# we are basically finding cliques in:
1-as_adjacency_matrix(g)