I wrote DFS-like algorithm to find all possible paths starting from zero level. With 2,000 nodes and 5,000 edges, below code execution is extremely slow. Any suggestion for this algorithm?
all_path = []
def printAllPathsUntil(s, path):
path.append(s)
if s not in adj or len(adj[s]) <= 0:
all_path.append(path[:]) # EDIT2
else:
for i in adj[s]:
printAllPathsUntil(i, path)
path.pop()
for point in points_in_start:
path = []
printAllPathsUntil(point, path)
And the adj
holds edges; start position as key and destination list as value.
points_in_start = [0, 3, 7]
adj = {0: [1, 8],
1: [2, 5],
2: [],
3: [2, 4],
4: [],
5: [6],
6: [],
7: [6],
8: [2]
}
The problem with your algorithm is that it will do a lot of repeated work. This is not the case in your example, as the only time when a node is reached by two other nodes, it is a leaf node, like C
, but imaging an edge from D
to B
: That would mean that the entire sub-graph starting at B
is visited again! For a graph with 2000 nodes, this will result in a significant slow-down.
To counter this, you can use memoization, but this means that you have to restructure your algorithm to instead of adding to the existing path
and then adding that path
to all_paths
, it has to return
the (partial) paths starting at the current node and combine those to the full paths with the parent node. You can then use functools.lru_cache
to re-use all those partial results when you visit B
again coming from another node.
from functools import lru_cache
@lru_cache(None)
def getAllPathsUntil(s):
if s not in adj or not adj[s]:
return [ [s] ]
else:
return [ [s, *p] for a in adj[s]
for p in getAllPathsUntil(a)]
all_paths = []
for point in points_in_start:
all_paths.extend(getAllPathsUntil(point))