Say I have an undirected cyclic graph (UCG). The weights of all edges are 1. Therefore, this UCG can be represented by an adjacency matrix A
:
import numpy as np
A = np.array([[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1],
[1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
[1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1],
[1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
[1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
[1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]])
To visualize the UCG, we can simply convert it to a networkx.Graph
object by
import networkx as nx
ucg = nx.Graph()
rows, cols = np.where(A == 1)
edges = zip(rows.tolist(), cols.tolist())
ucg.add_edges_from(edges)
I colored the nodes in different colors in order to show the "minimum distance". The orange nodes {8, 9, 10}
are the starting nodes, the green nodes {0, 1, 2, 3}
are the nodes that has the minimum distance of 1 to the starting nodes, and the blue nodes {4, 5, 6, 7}
has the minimum distance of 2. Now I want to convert this into a directed acyclic graph (DAG) with the arrows pointing from starting nodes to distance-1 nodes to distance-2 nodes and so on. The edges between the nodes with same "minimum distance" are discarded.
The expected output is a dictionary that represents the DAG:
d = {8: {1, 3},
9: {1, 2},
10: {0, 2, 3},
0: {4, 6, 7},
1: {5, 6, 7},
2: {4, 5, 6},
3: {4, 5, 7}}
Similarly, to visualize the DAG, we can convert it into a networkx.DiGraph
object by
dag = nx.DiGraph()
dag.add_edges_from([(k, v) for k, vs in d.items() in for v in vs])
And the expected output DAG looks like this:
I want to write an efficient and general code to convert a given UCG with given starting nodes to the corresponding DAG.
Clearly, a recursion is called for. My idea is to use a BFS approach to find the 1-distance nodes for each starting nodes, then their 1-distance nodes, and the recursion goes on and on. All visited nodes are stored in a set prev_starts
to avoid going backwards. Below is my code
from collections import defaultdict
def ucg2dag(A, starts):
"""Takes the adjacency matrix of a UCG and the indices of the
starting nodes, returns the dictionary of a DAG."""
def recur(starts):
starts = list(set(starts))
idxs, nbrs = np.where(A[starts] == 1)
prev_starts.update(starts)
# Filter out the neighbors that are previous starts so the
# arrows do not point backwards
try:
idxs, nbrs = zip(*((idx, nbr) for idx, nbr in zip(idxs, nbrs)
if nbr not in prev_starts))
# Terminate if every neighbor is a previous start.
except:
return d
for idx, nbr in zip(idxs, nbrs):
d[starts[idx]].add(nbr)
return recur(starts=nbrs)
prev_starts = set()
d = defaultdict(set)
return recur(starts)
Testing my code:
d = ucg2dag(A, starts={8, 9, 10})
print(d)
Edit: after adding return
before recur
thanks to @trincot's comment, I am able to get the correct output:
defaultdict(<class 'set'>,
{8: {1, 3},
9: {1, 2},
10: {0, 2, 3},
0: {4, 6, 7},
1: {5, 6, 7},
2: {4, 5, 6},
3: {4, 5, 7}})
%timeit 37.6 µs ± 591 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In reality I have a much larger graph. I want to know if there are any algorithm that is more efficient?
You have applied some fixes to your code (partially based on comments), so that now you have working code.
The only few remarks that remain are:
BFS is typically not a recursive algorithm (in contrast to DFS): the recursion you have is a case of tail recursion. In that case it can be written as a loop, and you'll avoid the use of the stack.
It is a pity you have to look up the edges in an adjacency matrix. It would be better to first convert the adjacency matrix to an adjacency list, unless the graph is really dense.
The output could also be an adjacency list, with an entry for each node, and that way it could be a list of lists instead of a dictionary
The repeated conversions of structure using zip
may not be the most efficient (I didn't benchmark though)
Without using numpy, it could look like this:
def ucg2dag(adj_matrix, starts):
adj_list = [
[target for target, is_connected in enumerate(row) if is_connected]
for row in adj_matrix
]
frontier = starts
dag = [[] for _ in range(len(adj_list))]
while frontier:
for source in frontier:
dag[source].extend(target for target in adj_list[source] if not target in starts)
frontier = set(target
for source in frontier for target in adj_list[source] if not target in starts
)
starts.update(frontier)
return dag
Example run:
adj_matrix = [[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1],
[1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
[1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1],
[1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
[1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
[1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]]
dag = ucg2dag(adj_matrix, {8, 9, 10})
print(dag)
The output for the example run:
[[4, 6, 7], [5, 6, 7], [4, 5, 6], [4, 5, 7], [], [], [], [], [1, 3], [1, 2], [0, 2, 3]]