Search code examples
cgraphshortest-pathdijkstragraph-traversal

Why does my implementation of Dijkstra's algorithm not behave as it should?


I am writing an implementation of Dijkstra's algorithm to learn about cool graph algorithms (this isn't a homework assignment, FYI). I am using Wikipedia's description of the algorithm as my main resource.

I have tested different traversal paths and gotten the following results ((foo, bar) means foo to bar):

crashes:
    (e, f)
    (f, d)
    (c, a)
    (f, g)
incorrect:
    (a, c)
    (g, f)
working:
    (d, f)

My graph that I am working with looks like this:

       F - H
       |   |
A ---- B - C
|         /
|        /
E - G - D

By tracing the path from E to F, I understand mostly why my code is failing. The other problem is that I don't know how to implement the algorithm using my way of doing it otherwise. Here's a tracing from E to F:

At node E, my neighbors are A and G. The shortest tentative distance is G, so that's the next current node. G's neighbors are E and D, but E was already traversed, so C is the next one. For C, its neighbor D was traversed, so we now arrive at B (B and H are equidistant, but it was chosen first in C's list of edges). Here is where my problem lies:

A's tentative distance was already calculated by E to be 2. Since the new tentative distance from B to A is much larger than just two, its distance stays at 2. For F, its distance is set to the tentative distance, since it was initialized as infinity. A's distance is smaller, so it's chosen as the next node. A's only neighbors are E and B, which have already been traversed, so all nodes around it have already been explored. The variable closest (see my code below) was initialized to a node with no other filled-in fields than a distance of infinity, so for the next iteration, it has no edges, and I get a segmentation fault.

I know that this is what happened in my code because of its output, shown below:

Current: e
New neighbor: a
New neighbor: g
g, closest, distance of 1
Current: g
New neighbor: d
d, closest, distance of 2
Current: d
New neighbor: c
c, closest, distance of 4
Current: c
New neighbor: b
New neighbor: h
b, closest, distance of 5
Current: b
New neighbor: a
New neighbor: f
a, closest, distance of 2
Current: a
?, closest, distance of 1000
Current: ?
Segmentation fault: 11

Where did I step wrong in implementing this algorithm? I have tried to follow Wikipedia's 6-step description of it very carefully. The only difference between their description and mine is that I am not using sets to keep track of explored and unexplored nodes (rather, that data is kept in the nodes themselves). Please provide any insight you can.

Note: I am compiling with Clang on a Mac with no optimization (-O0). I've noticed that with higher optimization, my program recurs infinitely and then gives me another segmentation fault, but I prioritize fixing the central problem with my algorithm before dealing with that.

#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>

#define infinity 1000

struct Node {
    unsigned char value;
    int visited, distance, edge_count;
    int* weights, weight_assign_index, freed;
    struct Node** edges;
};

typedef struct Node Node;

Node* init_node(const unsigned char value, const int edge_count) {
    Node* node = malloc(sizeof(Node));

    node -> value = value;
    node -> visited = 0;
    node -> distance = infinity;
    node -> edge_count = edge_count;

    node -> weights = malloc(edge_count * sizeof(int));
    node -> weight_assign_index = 0;
    node -> freed = 0;

    node -> edges = malloc(edge_count * sizeof(Node*));

    return node;
}

void assign_edges(Node* node, const int amount, ...) {
    va_list edges;
    va_start(edges, amount);

    for (int i = 0; i < amount; i++)
        node -> edges[i] = va_arg(edges, Node*);

    va_end(edges);
}

void assign_weight(Node* node_1, Node* node_2, const int weight) {
    for (int i = 0; i < node_1 -> edge_count; i++) {
        if (node_1 -> edges[i] == node_2) {
            node_1 -> weights[node_1 -> weight_assign_index++] = weight;
            node_2 -> weights[node_2 -> weight_assign_index++] = weight;
        }
    }
}

void deinit_graph(Node* node) {
    if (!node -> freed) {
        node -> freed = 1;
        free(node -> weights);
        for (int i = 0; i < node -> edge_count; i++)
            deinit_graph(node -> edges[i]);
        free(node -> edges);
    }
}

void dijkstra(Node* current, Node* goal) {
    Node local_closest;
    local_closest.distance = infinity;
    Node* closest = &local_closest;

    printf("Current: %c\n", current -> value);

    for (int i = 0; i < current -> edge_count; i++) {
        Node* neighbor = current -> edges[i];
        if (!neighbor -> visited) {
            printf("New neighbor: %c\n", neighbor -> value);
            const int tentative_distance = current -> distance + current -> weights[i];

            if (tentative_distance < neighbor -> distance)
                neighbor -> distance = tentative_distance;

            if (neighbor -> distance < closest -> distance)
                closest = neighbor;
        }
    }
    printf("%c, closest, distance of %d\n", closest -> value, closest -> distance);

    current -> visited = 1;
    if (closest == goal) printf("Shortest distance is %d\n", closest -> distance);
    else dijkstra(closest, goal);
}

int main() {
    Node
        *a = init_node('a', 2),
        *b = init_node('b', 3),
        *c = init_node('c', 3),
        *d = init_node('d', 2),
        *e = init_node('e', 2),
        *f = init_node('f', 2),
        *g = init_node('g', 2),
        *h = init_node('h', 2);

    assign_edges(a, 2, e, b);
    assign_edges(b, 3, a, f, c);
    assign_edges(c, 3, b, h, d);
    assign_edges(d, 2, c, g);
    assign_edges(e, 2, a, g);
    assign_edges(f, 2, b, h);
    assign_edges(g, 2, e, d);
    assign_edges(h, 2, f, c);

    assign_weight(a, e, 2);
    assign_weight(a, b, 4);
    assign_weight(b, c, 1);
    assign_weight(b, f, 1);
    assign_weight(f, h, 1);
    assign_weight(h, c, 1);
    assign_weight(c, d, 2);
    assign_weight(d, g, 1);
    assign_weight(g, e, 1);


    e -> distance = 0;
    dijkstra(e, f);
    deinit_graph(a);
}

Solution

  • Read Step 6 of Wikipedia's algorithm again:

    Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.

    The "select" here means "among all the unvisited nodes in the entire graph", not just among the unvisited neighbors of the current node, which is what your code is doing. So if the unvisited node of smallest tentative distance is not a neighbor of the current node, your code goes astray. And if the current node has no unvisited neighbors at all (which is entirely possible, either with a situation like what you encountered, or more simply with a dead-end node), your code absurdly visits the local_closest node, which isn't in the graph at all and whose edges are uninitialized, naturally causing a crash.

    So you diverged from the correct algorithm sooner than the visit to A which you are focusing on. When you finished visiting D, the remaining unvisited nodes were A at tentative distance 2, C at tentative distance 4, and B,F,H at tentative distance infinity. So by the algorithm, you ought to visit A next. But instead you visit C, again because your code wrongly only considers neighbors of the current node as candidates for the next node to visit.

    Frankly I don't see how a recursive algorithm is going to be workable at all here. You need to have access to some data structure that tracks all the unvisited nodes, so that at any time you can find the one at minimum tentative distance, even if it's very far away from your current node. Your idea of keeping track of visited status on the nodes themselves has a problem with this, because you have no good way to search them all, except by going back to the starting node and doing some kind of DFS/BFS. That's (1) not possible in your current implementation, because the recursive calls to dijkstra no longer have a pointer to the starting node, and (2) very inefficient, because it's O(N) on every visit.

    There's a good reason why the Wikipedia algorithm suggests using a set here. And I think it lends itself better to an iterative than to a recursive algorithm.