Search code examples
pythonalgorithmfor-loopoptimizationdepth-first-search

Optimizing python DFS (for loop is inefficient)


Given the following function, what would be the correct and pythonic way to archiving the same (and faster) result?

My code is not efficient and I believe I'm missing something that is staring at me.

The idea is to find a pattern that is [[A,B],[A,C],[C,B]] without having to generate additional permutations (since this will result in a higher processing time for the comparisons).

The length of the dictionary fed into find_path in real-life would be approximately 10,000, so having to iterate over that amount with the current code version below is not efficient.

from time import perf_counter
from typing import List, Generator, Dict


def find_path(data: Dict) -> Generator:
    for first_pair in data:

        pair1: List[str] = first_pair.split("/")

        for second_pair in data:
            pair2: List[str] = second_pair.split("/")
            if pair2[0] == pair1[0] and pair2[1] != pair1[1]:

                for third_pair in data:
                    pair3: List[str] = third_pair.split("/")

                    if pair3[0] == pair2[1] and pair3[1] == pair1[1]:

                        amount_pair_1: int = data.get(first_pair)[
                            "amount"
                        ]
                        id_pair_1: int = data.get(first_pair)["id"]

                        amount_pair_2: int = data.get(second_pair)[
                            "amount"
                        ]
                        id_pair_2: int = data.get(second_pair)["id"]

                        amount_pair_3: int = data.get(third_pair)[
                            "amount"
                        ]
                        id_pair_3: int = data.get(third_pair)["id"]

                        yield (
                            pair1,
                            amount_pair_1,
                            id_pair_1,
                            pair2,
                            amount_pair_2,
                            id_pair_2,
                            pair3,
                            amount_pair_3,
                            id_pair_3,
                        )


raw_data = {
    "EZ/TC": {"id": 1, "amount": 9},
    "LM/TH": {"id": 2, "amount": 8},
    "CD/EH": {"id": 3, "amount": 7},
    "EH/TC": {"id": 4, "amount": 6},
    "LM/TC": {"id": 5, "amount": 5},
    "CD/TC": {"id": 6, "amount": 4},
    "BT/TH": {"id": 7, "amount": 3},
    "BT/TX": {"id": 8, "amount": 2},
    "TX/TH": {"id": 9, "amount": 1},
}


processed_data = list(find_path(raw_data))

for i in processed_data:
    print(("The path to traverse is:", i))

>> ('The path to traverse is:', (['CD', 'TC'], 4, 6, ['CD', 'EH'], 7, 3, ['EH', 'TC'], 6, 4))
>> ('The path to traverse is:', (['BT', 'TH'], 3, 7, ['BT', 'TX'], 2, 8, ['TX', 'TH'], 1, 9))
>> ('Time to complete', 5.748599869548343e-05)

# Timing for a simple ref., as mentioned above, the raw_data is a dict containing about 10,000 keys

Solution

  • You can't do that with this representation of the graph. This algorithm has O(|E|^3) time complexity. It is a good idea to store edges as array of lists, each list will store only adjacent vertexes. And then it is easy to do what you need. Fortunately, you can re-represent graph in O(|E|) time.

    How to do that

    We will store graph as array of vertices (but in this case because of string vertex-values we take a dictionary). We want to access in all neighbours by a vertex. Let's do that -- we will store in the array lists of all neighbours of the given vertex.

    Now we just need to construct our structure by set of edges (aka row_data). How to add an edge in graph? Easy! We should find a vertex from in our array and add a vertex to to the list of it's neighbours

    So, the construct_graph function could be like:

    def construct_graph(raw_data):  # here we will change representation
        graph = defaultdict(list)   # our graph
        for pair in raw_data:       # go through every edge
            u, v = pair.split("/")  # get from and to vertexes
            graph[u].append(v)      # and add this edge in our structure
        return graph                # return our new graph to other functions
    

    How to find path length 2

    We will use dfs on our graph.

    def dfs(g, u, dist):                # this is a simple dfs function
        if dist == 2:                   # we has a 'dist' from our start
            return [u]                  # and if we found already answer, return it
        for v in g.get(u, []):          # otherwise check all neighbours of current vertex
            ans = dfs(g, v, dist + 1)   # run dfs in every neighbour with dist+1
            if ans:                     # and if that dfs found something
                ans.append(u)           # store it in ouy answer
                return ans              # and return it
        return []                       # otherwise we found nothing
    

    And then we just try it for every vertex.

    def main():
        graph = construct_graph(raw_data)
        for v in graph.keys():              # here we will try to find path
            ans = dfs(graph, v, 0)          # starting with 0 dist
            if ans:                         # and if we found something
                print(list(reversed(ans)))  # return it, but answer will be reversed