Search code examples
algorithmpathruntimehadoopgraph-traversal

All pairs all paths on a graph


This is possibly a problem with possibly no optimal solution. Suppose I have a directed graph, have no clue whether it has any cycles or not(cycle detection will be one of the aspects of this problem). Given a set of vertices(possibly millions of vertices) I need to count all the distinct paths(paths with no duplicate vertices) between all the unique pairs of the given graph. How would I go about tackling this situation?

Lets look at a brute-force way to do this:

  • Compute all possible pairs from the graph.

  • For each pair of the graph use DFS to get all the paths from Source to Destination.

  • Assuming the pairs are represented in a hash table, put the path count as the value against this pair.
  • Repeat for the rest of the pairs.

Can people point out what are some of the things that can go wrong with this? Lets think of the problem in this manner, what are the computational challenges of finding all the distinct paths between all the cities of the planet and if one would even attempt to solve this problem, where would one start?

Edit: Some of the algorithms presented produce results in O(n!) factorial time. This is somewhat of a daunting computation for a single machine with limited resources to handle. Does anyone know of map-reduce techniques to break down the problem of graph traversal into smaller chunks?


Solution

  • The Floyd-Warshall generalization that gets a rough approximation of paths is this:

     procedure FloydWarshall ()
        for k := 1 to n
           for i := 1 to n
              for j := 1 to n
                 path[i][j] = path[i][j] + path[i][k]*path[k][j] 
    

    Here is a very rough idea on how to scale this. Disclaimer - This isn't concrete -it's very hand wavy, but hopefully it helps more than confuses. It helps to understand how the algorithm works. It starts on an adjacency matrix of the graph. In each iteration k, we're saying the number of paths from i to j is equal to the number of paths in prior iterations from i to j plus the number of paths from i to j via k.

    Thus Break the graph in to n arbitrary sub-adjacency matricies of size k x k, compute above on each of them. Now you have the number of paths and start combining submatricies by recomputing part of the above on the combined matrix. I.e, we only need to recompute n/2 values for k when recombining. I found this, which looks like a similar direction http://theory.stanford.edu/~oldham/publications/generalized/asgsp.pdf.