Find connected branches from list of line segments

Problem

I have a list of line segments:

exampleLineSegments = [(1,2),(2,3),(3,4),(4,5),(5,6),(4,7),(8,7)]

These segments include the indices of the corresponding point in a separate array.

From this sublist, one can see that there is a branching point (4). So three different branches are emerging from this branching point. (In other, more specific problems, there might be / are multiple branching points for n branches.)

Target

My target is to get a dictionary including information about the existing branches, so e.g.:

result = { branch_1: [1,2,3,4],
           branch_2: [4,5,6],
           branch_3: [4,7,8]}

Current state of work/problems

Currently, I am identifying the branch points first by setting up a dictionary for each point and checking for each entry if there are more than 2 neighbor points found. This means that there is a branching point.

Afterwards I am crawling through all points emerging from these branch points, checking for successors etc.

In these functions, there are a some for loops and generally an intensive "crawling". This is not the cleanest solution and if the number of points increasing, the performance is not so good either.

Question

What is the best / fastest / most performant way to achieve the target in this case?

Solution

I think you can achieve it by following steps:

use a neighbors dict to store the graph
find all branch points, which neighbors count > 2
start from every branch point, and use dfs to find all the paths

from collections import defaultdict
def find_branch_paths(exampleLineSegments):
    # use dict to store the graph
    neighbors = defaultdict(list)
    for p1, p2 in exampleLineSegments:
        neighbors[p1].append(p2)
        neighbors[p2].append(p1)

    # find all branch points
    branch_points = [k for k, v in neighbors.items() if len(v) > 2]

    res = []

    def dfs(cur, prev, path):
        # reach the leaf
        if len(neighbors[cur]) == 1:
            res.append(path)
            return
        for neighbor in neighbors[cur]:
            if neighbor != prev:
                dfs(neighbor, cur, path + [neighbor])

    # start from all the branch points
    for branch_point in branch_points:
        dfs(branch_point, None, [branch_point])

    return res

update an iteration version, for big data, which may cause a recursion depth problem:

def find_branch_paths(exampleLineSegments):
    # use dict to store the graph
    neighbors = defaultdict(list)
    for p1, p2 in exampleLineSegments:
        neighbors[p1].append(p2)
        neighbors[p2].append(p1)

    # find all branch points
    branch_points = [k for k, v in neighbors.items() if len(v) > 2]

    res = []
    # iteration way to dfs
    stack = [(bp, None, [bp]) for bp in branch_points]
    while stack:
        cur, prev, path = stack.pop()

        if len(neighbors[cur]) == 1 or (prev and cur in branch_points):
            res.append(path)
            continue

        for neighbor in neighbors[cur]:
            if neighbor != prev:
                stack.append((neighbor, cur, path + [neighbor]))

    return res

test and output:

print(find_branch_paths([(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (4, 7), (8, 7)]))
# output: 
# [[4, 3, 2, 1], [4, 5, 6], [4, 7, 8]]

Hope that helps you, and comment if you have further questions. : )

UPDATE: if there are many branch points, the path will grow exponentially. So if you only want distinct segments, you can end the path when encounter another branch point.

change this line
if len(neighbors[cur]) == 1:
to
if len(neighbors[cur]) == 1 or (prev and cur in branch_points):