Search code examples
pythonalgorithmrecursiongraphnetworkx

Fastest way to convert an undirected cyclic graph (UCG) to a directed acyclic graph (DAG) in Python?


Say I have an undirected cyclic graph (UCG). The weights of all edges are 1. Therefore, this UCG can be represented by an adjacency matrix A:

import numpy as np

A = np.array([[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1],
              [1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
              [1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1],
              [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
              [1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0],
              [0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0],
              [1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
              [1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
              [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
              [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
              [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]])

To visualize the UCG, we can simply convert it to a networkx.Graph object by

import networkx as nx

ucg = nx.Graph()
rows, cols = np.where(A == 1)
edges = zip(rows.tolist(), cols.tolist())
ucg.add_edges_from(edges)

And the UCG looks like this: enter image description here

I colored the nodes in different colors in order to show the "minimum distance". The orange nodes {8, 9, 10} are the starting nodes, the green nodes {0, 1, 2, 3} are the nodes that has the minimum distance of 1 to the starting nodes, and the blue nodes {4, 5, 6, 7} has the minimum distance of 2. Now I want to convert this into a directed acyclic graph (DAG) with the arrows pointing from starting nodes to distance-1 nodes to distance-2 nodes and so on. The edges between the nodes with same "minimum distance" are discarded.

The expected output is a dictionary that represents the DAG:

d = {8: {1, 3},
     9: {1, 2},
     10: {0, 2, 3},
     0: {4, 6, 7},
     1: {5, 6, 7},
     2: {4, 5, 6},
     3: {4, 5, 7}}

Similarly, to visualize the DAG, we can convert it into a networkx.DiGraph object by

dag = nx.DiGraph()
dag.add_edges_from([(k, v) for k, vs in d.items() in for v in vs])

And the expected output DAG looks like this: enter image description here

I want to write an efficient and general code to convert a given UCG with given starting nodes to the corresponding DAG.

What I have tried

Clearly, a recursion is called for. My idea is to use a BFS approach to find the 1-distance nodes for each starting nodes, then their 1-distance nodes, and the recursion goes on and on. All visited nodes are stored in a set prev_starts to avoid going backwards. Below is my code

from collections import defaultdict

def ucg2dag(A, starts):
    """Takes the adjacency matrix of a UCG and the indices of the
    starting nodes, returns the dictionary of a DAG."""

    def recur(starts):
        starts = list(set(starts))
        idxs, nbrs = np.where(A[starts] == 1)
        prev_starts.update(starts)

        # Filter out the neighbors that are previous starts so the
        # arrows do not point backwards
        try:
            idxs, nbrs = zip(*((idx, nbr) for idx, nbr in zip(idxs, nbrs)
                                if nbr not in prev_starts))
        # Terminate if every neighbor is a previous start.
        except:
            return d

        for idx, nbr in zip(idxs, nbrs):
            d[starts[idx]].add(nbr)

        return recur(starts=nbrs)

    prev_starts = set()
    d = defaultdict(set)
    return recur(starts)

Testing my code:

d = ucg2dag(A, starts={8, 9, 10})
print(d)

Edit: after adding return before recur thanks to @trincot's comment, I am able to get the correct output:

defaultdict(<class 'set'>, 
            {8: {1, 3}, 
             9: {1, 2}, 
             10: {0, 2, 3}, 
             0: {4, 6, 7}, 
             1: {5, 6, 7}, 
             2: {4, 5, 6}, 
             3: {4, 5, 7}})
%timeit 37.6 µs ± 591 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In reality I have a much larger graph. I want to know if there are any algorithm that is more efficient?


Solution

  • You have applied some fixes to your code (partially based on comments), so that now you have working code.

    The only few remarks that remain are:

    • BFS is typically not a recursive algorithm (in contrast to DFS): the recursion you have is a case of tail recursion. In that case it can be written as a loop, and you'll avoid the use of the stack.

    • It is a pity you have to look up the edges in an adjacency matrix. It would be better to first convert the adjacency matrix to an adjacency list, unless the graph is really dense.

    • The output could also be an adjacency list, with an entry for each node, and that way it could be a list of lists instead of a dictionary

    • The repeated conversions of structure using zip may not be the most efficient (I didn't benchmark though)

    Without using numpy, it could look like this:

    def ucg2dag(adj_matrix, starts):
        adj_list = [
            [target for target, is_connected in enumerate(row) if is_connected]
                for row in adj_matrix
        ]
    
        frontier = starts
    
        dag = [[] for _ in range(len(adj_list))]
    
        while frontier:
            for source in frontier:
                dag[source].extend(target for target in adj_list[source] if not target in starts)
            frontier = set(target 
                for source in frontier for target in adj_list[source] if not target in starts
            )
            starts.update(frontier)
    
        return dag
    

    Example run:

    adj_matrix = [[0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1],
                  [1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0],
                  [1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1],
                  [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1],
                  [1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0],
                  [0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0],
                  [1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0],
                  [1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0],
                  [0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
                  [0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                  [1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0]]
    
    dag = ucg2dag(adj_matrix, {8, 9, 10})
    print(dag)  
    

    The output for the example run:

    [[4, 6, 7], [5, 6, 7], [4, 5, 6], [4, 5, 7], [], [], [], [], [1, 3], [1, 2], [0, 2, 3]]