python python-3.x graph networkx graph-theory

How to efficiently find pairs of nodes in a graph that when removed will cause the graph to split?

Consider this simple biconnected graph (graph without articulation points):

import networkx as nx

G = nx.diamond_graph()
nx.draw(G, with_labels=True)

I want to find out what two pairs of nodes when removed from the graph will cause the graph to split. For example removing nodes 1 and 2 from this graph will cause the graph to have 2 components: node 3 and node 0). For this I am removing the combination of nodes one by one and counting the number of components left in the graph.

from itertools import combinations

breaking_combinations = []
combos = combinations(G.nodes(), 2)
for combo in combos:
    G_copy = G.copy()
    init_num_components = nx.number_connected_components(G_copy)
    G_copy.remove_nodes_from(combo)
    num_components_after_combo_removed = nx.number_connected_components(G_copy)
    if num_components_after_combo_removed > init_num_components:
        breaking_combinations.append(combo)

print(breaking_combinations)

The logic works fine and I am getting the expected results: (1, 2) in this example. But this process is so slow on big graphs (~10,000 nodes) that it takes forever to find all combinations in the graph.

My question is: is there an efficient alternate approach I can use to achieve the same results - perhaps by using in-built networkx algorithms?

Solution

To expand on my comment above:

import networkx as nx

G = nx.diamond_graph()

solutions = []
for removed_node in G:
    H = G.subgraph([node for node in G if node != removed_node]) # subgraph views are much faster to initialize than copies
    solutions.extend([(removed_node, node) for node in nx.articulation_points(H)])
print(solutions)
# (1, 2), (2, 1)

Edit

I also did a little bit of testing. The results are encouraging.

import networkx as nx

from itertools import combinations


def find_articulation_pairs(G):
    solutions = []
    for removed_node in G:
        H = G.subgraph([node for node in G if node != removed_node]) # subgraph views are much faster to initialize than copies
        solutions.extend([(removed_node, node) for node in nx.articulation_points(H)])
    return solutions


def find_articulation_pairs_baseline(G):
    # the original approach with minor improvements
    total_components = nx.number_connected_components(G) # no need to compute this number on each iteration
    solutions = []
    for removed_nodes in combinations(G, 2):
        H = G.subgraph([node for node in G if node not in removed_nodes])
        if nx.number_connected_components(H) > total_components:
            solutions.append(removed_nodes)
    return solutions


def test_correctness():
    G = nx.diamond_graph()
    print(find_articulation_pairs(G))
    print(find_articulation_pairs_baseline(G))


def test_running_time():
    import time

    # large random graph
    G = nx.erdos_renyi_graph(100, 0.1)

    # largest biconnected subgraph
    H = G.subgraph(max(nx.biconnected_components(G), key=len))

    tic = time.time()
    _ = find_articulation_pairs(H)
    toc = time.time()
    _ = find_articulation_pairs_baseline(H)
    tac = time.time()

    print(f"Baseline time: {tac-toc:.2f}")
    print(f"Improved time: {toc-tic:.2f}")


if __name__ == "__main__":

    test_correctness()
    test_running_time()

Running the script results in the following output:

[(1, 2), (2, 1)]
[(1, 2)]
Baseline time: 14.72
Improved time: 0.44

Edit #2

Both approaches are trivial to parallelize, so depending on the number of available cores, there is another speed-up of factor 4-12 to achieve.

Some back of the envelope calculations suggest that for 10k nodes, the running time of my improved approach (on my machine without parallelization) would still be around 2 days, so such parallelization is probably not optional but a must.