Consider this simple biconnected graph (graph without articulation points):
import networkx as nx
G = nx.diamond_graph()
nx.draw(G, with_labels=True)
I want to find out what two pairs of nodes when removed from the graph will cause the graph to split. For example removing nodes 1 and 2 from this graph will cause the graph to have 2 components: node 3 and node 0). For this I am removing the combination of nodes one by one and counting the number of components left in the graph.
from itertools import combinations
breaking_combinations = []
combos = combinations(G.nodes(), 2)
for combo in combos:
G_copy = G.copy()
init_num_components = nx.number_connected_components(G_copy)
G_copy.remove_nodes_from(combo)
num_components_after_combo_removed = nx.number_connected_components(G_copy)
if num_components_after_combo_removed > init_num_components:
breaking_combinations.append(combo)
print(breaking_combinations)
The logic works fine and I am getting the expected results: (1, 2)
in this example. But this process is so slow on big graphs (~10,000 nodes) that it takes forever to find all combinations in the graph.
My question is: is there an efficient alternate approach I can use to achieve the same results - perhaps by using in-built networkx algorithms?
To expand on my comment above:
import networkx as nx
G = nx.diamond_graph()
solutions = []
for removed_node in G:
H = G.subgraph([node for node in G if node != removed_node]) # subgraph views are much faster to initialize than copies
solutions.extend([(removed_node, node) for node in nx.articulation_points(H)])
print(solutions)
# (1, 2), (2, 1)
I also did a little bit of testing. The results are encouraging.
import networkx as nx
from itertools import combinations
def find_articulation_pairs(G):
solutions = []
for removed_node in G:
H = G.subgraph([node for node in G if node != removed_node]) # subgraph views are much faster to initialize than copies
solutions.extend([(removed_node, node) for node in nx.articulation_points(H)])
return solutions
def find_articulation_pairs_baseline(G):
# the original approach with minor improvements
total_components = nx.number_connected_components(G) # no need to compute this number on each iteration
solutions = []
for removed_nodes in combinations(G, 2):
H = G.subgraph([node for node in G if node not in removed_nodes])
if nx.number_connected_components(H) > total_components:
solutions.append(removed_nodes)
return solutions
def test_correctness():
G = nx.diamond_graph()
print(find_articulation_pairs(G))
print(find_articulation_pairs_baseline(G))
def test_running_time():
import time
# large random graph
G = nx.erdos_renyi_graph(100, 0.1)
# largest biconnected subgraph
H = G.subgraph(max(nx.biconnected_components(G), key=len))
tic = time.time()
_ = find_articulation_pairs(H)
toc = time.time()
_ = find_articulation_pairs_baseline(H)
tac = time.time()
print(f"Baseline time: {tac-toc:.2f}")
print(f"Improved time: {toc-tic:.2f}")
if __name__ == "__main__":
test_correctness()
test_running_time()
Running the script results in the following output:
[(1, 2), (2, 1)]
[(1, 2)]
Baseline time: 14.72
Improved time: 0.44
Both approaches are trivial to parallelize, so depending on the number of available cores, there is another speed-up of factor 4-12 to achieve.
Some back of the envelope calculations suggest that for 10k nodes, the running time of my improved approach (on my machine without parallelization) would still be around 2 days, so such parallelization is probably not optional but a must.