Search code examples
algorithmmatrixmathematical-optimization

Finding a large non-sparse submatrix in a large sparse matrix


I have a very large N x P binary matrix, and I'd like to find two sets R and C which contain the indexes of the rows and columns that make up the largest possible dense (non-sparse, made up of only 1's) submatrix, which meets at least a certain size for |R| and |C|. The indices need not necessarily be contiguous. For example, with minimum |R|=3 |C|=2, for the matrix below I'd like to output R = {1,2,3} C = {2,5}.

0 1 1 0 1

1 1 0 1 1

0 1 0 1 1

Are there any good algorithms to do this?

One thought I had was to model this as an optimization problem, I don't know if a problem like this could be written in AMPL to use with any open source solver, for example, something along the lines of (not sure how to state it precisely):

optimization

But I suspect it's NP-hard or requiring a power set search. Or could this problem be also seen as an adjacency matrix and trying to find a maximal clique? Any ideas are welcome!


Solution

  • I ended up realizing that this problem can be reformulated as a bipartite graph, and then the maximum size submatrix would be equivalent to a maximum edge biclique in bipartite graph, which is an NP-complete problem. I found the BiMax implementation in the R package biclust very useful!