Search code examples
pythondirected-acyclic-graphsgraph-traversal

How do I enumerate all paths in a Directed Acyclic Graph structure using Python, when the data structure can only be discovered via traversal?


Please allow me to clarify a few details before beginning:

  1. I've really worked hard to research this problem on my own. I have looked at solutions like this or this but all those solutions assume you have the entire data structure in memory.
  2. This is for something practical, and is not a homework assignment or job interview question. I've attempted to come up with a solution and have been unable, even though I know it's theoretically possible.
  3. The specifics: I am using python/Selenium to traverse a Twine file, but I'm showing a generalized pseudocode interface because those details aren't important to solving the problem, only to demonstrate that I need to traverse the path to discover it.

So, I have a Directed Acyclic Graph with about 150 nodes. There is a single start node, and many end nodes.

Imagine I have an interface similar to this to traverse the graph:

class TheGraphToTraverse:

    def get_the_first_node(self):
        pass

    def get_the_list_of_edges_from_the_current_node(self):
        """
        Returns list of links/edges that can be followed
        """

    def follow_this_edge(self, edge):
        """
        Follow an edge link to its destination node
        """

    def go_back_one_step(self):
        """
        Navigate back one step
        """

    def go_back_to_the_start_node(self):
        """
        Start over from the beginning node
        """

    def is_this_node_an_end_node(self):
        """
        Is the current node an end node with no further edges to traverse?
        """

How would I adapt a (recursive or non-recursive) algorhithm like these but without needing to pass the graph itself into the recursive function:

data = {1 : [2,3],   # Directed acyclic graph adjacency list
        2 : [3],
        3 : [4,5],
        4 : [5]}

def dfs(data, path, paths = []):   
    datum = path[-1]              
    if datum in data:
        for val in data[datum]:
            new_path = path + [val]
            paths = dfs(data, new_path, paths)
    else:
        paths += [path]
    return paths

See how this algorithm (and all I can find online) has you passing the entire data structure in? I don't have the graph as a data structure and need to discover it during traversal, so I need to be able to keep some kind of state on my current pathway, and I'm having trouble seeing clearly how to store that state reliably.

I've gotten this far:

    list_of_all_paths = []
    g = TheGraphToTraverse()
    first_node = g.get_the_first_node()
    list_of_all_paths = dfs(g, first_node, list_of_all_paths)

    def dfs(g: TheGraphToTraverse, path, all_paths):

        # check for end state
        if g.is_this_node_an_end_node:
            all_paths += [path]
            g.go_back_one_step()
            # I recognize I'm not sure how to store 
            # the previous edges I've stored for this path
            all_paths = dfs(g, path, all_paths)
        else:
            edges = g.get_the_list_of_edges_from_the_current_node()
            # how do I know which of these I've already tried?
            # how do I know if this is a list of 
            # edges I've already tried?
            for edge in edges:
                g.follow_this_edge(edge)
                all_paths = dfs(g, path, all_paths)
        return all_paths

Any help or adapted algorithm, even if it's not in Python, or is not recursive is fine. Thanks so much for engaging with this.


Solution

  • You do not currently expose a method to get any node other than the first node, so I'm assuming you're looking for the edges only?

    I've written a generator, because I've found that using lists in recursive functions tends to lead to bugs where lists are mutated when they shouldn't be or vice versa. For the same reason, I've used tuples instead of lists for sequences of edges.

    def dfs(g: TheGraphToTraverse, current_path=()):
        '''Depth first search on an unbuilt graph. ``current_path`` is a tuple of edges, forming a single traversal path.
    
        Returns an iterator of tuples of edges.
        '''
        if g.is_this_node_an_end_node():
            yield current_path
        else:
            edges = g.get_the_list_of_edges_from_the_current_node()
            for edge in edges:
                # A "sandwich" of graph traversal management with the recursion in between.
                # This ensures that every follow_this_edge is paired with a go_back_one_step.
                g.follow_this_edge(edge)
                yield from dfs(g, current_path + (edge,))
                g.go_back_one_step()
    
    g = TheGraphToTraverse()
    list_of_all_paths = list(dfs(g))  # type: list[tuple[edge_type, ...]]