Search code examples
algorithmpseudocode

Is there an efficient algorithm to find the "maximal connected set"?


Given a 2D table of boolean values describing connections between pairs of nodes, is there an efficient way to find the largest subset of nodes in which all nodes are connected to all nodes?

Example with 6 nodes:

enter image description here

In this case, the "maximal connected set" is {node1, node4, node5}. Although node0 is connected to node2 and node3, node2 and node3 are not connected, so do not form a "connected set".

This is a small example, but I am interested in a general algorithm that could in principle be applied to very large tables.

In case it helps, my objective is to reproduce the Mn values from Table I of this paper: Sarwate, D.V., and M.B. Pursley, "Crosscorrelation Properties of Pseudorandom and Related Sequences," Proc. IEEE, Vol. 68, No. 5, May, 1980, pp. 583-619.

I will be coding this in MATLAB, but am also fairly fluent in C/C++.

EDIT: Here is the table I want to reproduce results from: enter image description here

  • The 1st and 2nd columns aren't really important here (just describe lengths of the codes).
  • The 3rd column is what I referred to as number of "nodes".
  • The 4th column is a measure of error if you use all nodes (regardless of whether they're "connected" or not).
  • The 6th column is the (minimum) error if only the "maximal connected set" is used.
  • The 5th column, Mn describes the number of nodes in the maximal connected set.

Solution

  • Your problem is equivalent to - in fact, it could even be considered as a restatement of - the maximal clique problem in graph theory. Graph theory deals with exactly the structures you are talking about: nodes connected together, which are called graphs, and one way to represent them is what you have above, which is called an adjacency matrix.

    A "maximal clique" is exactly what you've described: the largest subset of the graph's nodes whom are such that each node is connected to every other.

    This problem is "NP-complete", which is basically a whole class of problems that are widely conjectured, but not proven, to not be solvable "efficiently": in particular, what that means is that with regard to the strongest conjectures made in that vein, which have plausibility arguments, these problems, at bare minimum, are exponentially time-consuming. That is, you essentially can't do much better than just exhaustively searching your whole graph, at least in the general case. That said, for a table this small, an exhaustive search of all the nodes and connections is still essentially instant even for a home computer, but on anything more than a relatively small scale, will become infeasible even for a supercomputer.