Search code examples
pythonpython-2.7dictionaryheapq

Strange interferences bewteen Heapq module and dictionary


On one hand, I have a grid defaultdict that stores the neighboring nodes of each node on a grid and its weight (all 1 in the example below).

        node   (w  nbr_node)
grid = { 0:   [(1, -5), (1, -4), (1, -3), (1, -1), (1, 1), (1, 3), (1, 4), (1, 5)], 
         1:   [(1, -4), (1, -3), (1, -2), (1, 0), (1, 2), (1, 4), (1, 5), (1, 6)], 
         2:   [(1, -3), (1, -2), (1, -1), (1, 1), (1, 3), (1, 5), (1, 6), (1, 7)], 
         3:   [(1, -2), (1, -1), (1, 0), (1, 2), (1, 4), (1, 6), (1, 7), (1, 8)],
        ...
        }

On the other, I have a Djisktra function that computes the shortest path between 2 nodes on this grid. The algorithm uses the heapq module and works perfectly fine.

import heapq

def Dijkstra(s, e, grid): #startpoint, endpoint, grid
    visited = set()
    distances = {s: 0} 
    p = {} 
    queue = [(0, s)] 

    while queue != []:

        weight, node = heappop(queue) 
        if node in visited: 
            continue

        visited.add(node) 

        for n_weight, n_node in grid[node]: 
            if n_node in visited: 
                continue

            total = weight + n_weight 

            if n_node not in distances or distances[n_node] > total: 

                distances[n_node] = total
                heappush(queue, (total, n_node))
                p[n_node] = node

Problem: when calling the Djikstra function multiple times, heappush is... adding new keys in the grid dictionary for no reason !

Here is a MCVE:

from collections import defaultdict

# Creating the dictionnary
grid = defaultdict(list) 
N = 4
kernel = (-N-1, -N, -N+1, -1, 1, N-1, N, N+1)

for i in range(N*N): 
    for n in kernel:
        if i > N and i < (N*N) - 1 - N and (i%N) > 0 and (i%N) < N - 1:
            grid[i].append((1, i+n))



# Calling Djikstra multiple times
keys = [*range(N*N)]

while keys:

    k1, k2 = random.sample(keys, 2)

    Dijkstra(k1, k2, grid) 

    keys.remove(k1)
    keys.remove(k2)

The original grid defaultdict:

dict_keys([5, 6, 9, 10])

...and after calling the Djikstra function multiple times:

dict_keys([5, 6, 9, 10, 4, 0, 1, 2, 8, 3, 7, 11, 12, 13, 14, 15])

When calling the Djikstra function multiple times without heappush (just commenting heappush at the end):

dict_keys([5, 6, 9, 10])

Question:

  • How can I avoid this strange behavior ?

Please note that I'm using Python 2.7 and can't use numpy.


Solution

  • I could reproduce and fix. The problem is in the way you are building grid: it contains values that are not in keys from -4 to 0 and from 16 to 20 in the example. So you push those inexistant nodes on the head, and later pop them.

    And you end in executing for n_weight, n_node in grid[node]: where node does not (still) exists in grid. As grid is a defaultdict, a new node is automatically inserted with an empty list as value.

    The fix is trivial (at least for the example data): it is enough to ensure that all nodes added as value is grid exist as key with a modulo:

    for i in range(N*N): 
        for n in kernel:
            grid[i].append((1, (i+n + N + 1)%(N*N)))
    

    But even for real data it should not be very hard to ensure that all nodes existing in grid values also exist in keys...

    BTW, if grid had been a simple dict the error would have been immediate with a KeyError on grid[node].