Problem
I have a list of line segments:
exampleLineSegments = [(1,2),(2,3),(3,4),(4,5),(5,6),(4,7),(8,7)]
These segments include the indices of the corresponding point in a separate array.
From this sublist, one can see that there is a branching point (4). So three different branches are emerging from this branching point. (In other, more specific problems, there might be / are multiple branching points for n branches.)
Target
My target is to get a dictionary including information about the existing branches, so e.g.:
result = { branch_1: [1,2,3,4],
branch_2: [4,5,6],
branch_3: [4,7,8]}
Current state of work/problems
Currently, I am identifying the branch points first by setting up a dictionary for each point and checking for each entry if there are more than 2 neighbor points found. This means that there is a branching point.
Afterwards I am crawling through all points emerging from these branch points, checking for successors etc.
In these functions, there are a some for loops and generally an intensive "crawling". This is not the cleanest solution and if the number of points increasing, the performance is not so good either.
Question
What is the best / fastest / most performant way to achieve the target in this case?
I think you can achieve it by following steps:
neighbors
dict to store the graphfrom collections import defaultdict
def find_branch_paths(exampleLineSegments):
# use dict to store the graph
neighbors = defaultdict(list)
for p1, p2 in exampleLineSegments:
neighbors[p1].append(p2)
neighbors[p2].append(p1)
# find all branch points
branch_points = [k for k, v in neighbors.items() if len(v) > 2]
res = []
def dfs(cur, prev, path):
# reach the leaf
if len(neighbors[cur]) == 1:
res.append(path)
return
for neighbor in neighbors[cur]:
if neighbor != prev:
dfs(neighbor, cur, path + [neighbor])
# start from all the branch points
for branch_point in branch_points:
dfs(branch_point, None, [branch_point])
return res
update an iteration
version, for big data, which may cause a recursion depth problem
:
def find_branch_paths(exampleLineSegments):
# use dict to store the graph
neighbors = defaultdict(list)
for p1, p2 in exampleLineSegments:
neighbors[p1].append(p2)
neighbors[p2].append(p1)
# find all branch points
branch_points = [k for k, v in neighbors.items() if len(v) > 2]
res = []
# iteration way to dfs
stack = [(bp, None, [bp]) for bp in branch_points]
while stack:
cur, prev, path = stack.pop()
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):
res.append(path)
continue
for neighbor in neighbors[cur]:
if neighbor != prev:
stack.append((neighbor, cur, path + [neighbor]))
return res
test and output:
print(find_branch_paths([(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (4, 7), (8, 7)]))
# output:
# [[4, 3, 2, 1], [4, 5, 6], [4, 7, 8]]
Hope that helps you, and comment if you have further questions. : )
UPDATE: if there are many branch points, the path will grow exponentially. So if you only want distinct segments, you can end the path when encounter another branch point.
change this line
if len(neighbors[cur]) == 1:
to
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):