python algorithm for-loop optimization depth-first-search

Optimizing python DFS (for loop is inefficient)

Given the following function, what would be the correct and pythonic way to archiving the same (and faster) result?

My code is not efficient and I believe I'm missing something that is staring at me.

The idea is to find a pattern that is [[A,B],[A,C],[C,B]] without having to generate additional permutations (since this will result in a higher processing time for the comparisons).

The length of the dictionary fed into find_path in real-life would be approximately 10,000, so having to iterate over that amount with the current code version below is not efficient.

from time import perf_counter
from typing import List, Generator, Dict


def find_path(data: Dict) -> Generator:
    for first_pair in data:

        pair1: List[str] = first_pair.split("/")

        for second_pair in data:
            pair2: List[str] = second_pair.split("/")
            if pair2[0] == pair1[0] and pair2[1] != pair1[1]:

                for third_pair in data:
                    pair3: List[str] = third_pair.split("/")

                    if pair3[0] == pair2[1] and pair3[1] == pair1[1]:

                        amount_pair_1: int = data.get(first_pair)[
                            "amount"
                        ]
                        id_pair_1: int = data.get(first_pair)["id"]

                        amount_pair_2: int = data.get(second_pair)[
                            "amount"
                        ]
                        id_pair_2: int = data.get(second_pair)["id"]

                        amount_pair_3: int = data.get(third_pair)[
                            "amount"
                        ]
                        id_pair_3: int = data.get(third_pair)["id"]

                        yield (
                            pair1,
                            amount_pair_1,
                            id_pair_1,
                            pair2,
                            amount_pair_2,
                            id_pair_2,
                            pair3,
                            amount_pair_3,
                            id_pair_3,
                        )


raw_data = {
    "EZ/TC": {"id": 1, "amount": 9},
    "LM/TH": {"id": 2, "amount": 8},
    "CD/EH": {"id": 3, "amount": 7},
    "EH/TC": {"id": 4, "amount": 6},
    "LM/TC": {"id": 5, "amount": 5},
    "CD/TC": {"id": 6, "amount": 4},
    "BT/TH": {"id": 7, "amount": 3},
    "BT/TX": {"id": 8, "amount": 2},
    "TX/TH": {"id": 9, "amount": 1},
}


processed_data = list(find_path(raw_data))

for i in processed_data:
    print(("The path to traverse is:", i))

>> ('The path to traverse is:', (['CD', 'TC'], 4, 6, ['CD', 'EH'], 7, 3, ['EH', 'TC'], 6, 4))
>> ('The path to traverse is:', (['BT', 'TH'], 3, 7, ['BT', 'TX'], 2, 8, ['TX', 'TH'], 1, 9))
>> ('Time to complete', 5.748599869548343e-05)

# Timing for a simple ref., as mentioned above, the raw_data is a dict containing about 10,000 keys

Solution

You can't do that with this representation of the graph. This algorithm has O(|E|^3) time complexity. It is a good idea to store edges as array of lists, each list will store only adjacent vertexes. And then it is easy to do what you need. Fortunately, you can re-represent graph in O(|E|) time.

How to do that

We will store graph as array of vertices (but in this case because of string vertex-values we take a dictionary). We want to access in all neighbours by a vertex. Let's do that -- we will store in the array lists of all neighbours of the given vertex.

Now we just need to construct our structure by set of edges (aka row_data). How to add an edge in graph? Easy! We should find a vertex from in our array and add a vertex to to the list of it's neighbours

So, the construct_graph function could be like:

def construct_graph(raw_data):  # here we will change representation
    graph = defaultdict(list)   # our graph
    for pair in raw_data:       # go through every edge
        u, v = pair.split("/")  # get from and to vertexes
        graph[u].append(v)      # and add this edge in our structure
    return graph                # return our new graph to other functions

How to find path length 2

We will use dfs on our graph.

def dfs(g, u, dist):                # this is a simple dfs function
    if dist == 2:                   # we has a 'dist' from our start
        return [u]                  # and if we found already answer, return it
    for v in g.get(u, []):          # otherwise check all neighbours of current vertex
        ans = dfs(g, v, dist + 1)   # run dfs in every neighbour with dist+1
        if ans:                     # and if that dfs found something
            ans.append(u)           # store it in ouy answer
            return ans              # and return it
    return []                       # otherwise we found nothing

And then we just try it for every vertex.

def main():
    graph = construct_graph(raw_data)
    for v in graph.keys():              # here we will try to find path
        ans = dfs(graph, v, 0)          # starting with 0 dist
        if ans:                         # and if we found something
            print(list(reversed(ans)))  # return it, but answer will be reversed