Search code examples
algorithmsetpriority-queuedijkstra

Understanding time complexity of Dijkstra priority queue vs set implementation


Consider the following two implementations of dijkstra's algorithm:

Using set:

// V - total number of vertices
// S - source node
void dijkstra(int V, vector<vector<int>> edge[], int S)
{
    vector<int> dist(V, 1e9);
    dist[S] = 0;
    
    set<pair<int, int>> s;
    for (int  i=0; i<V; i++)
        s.insert({dist[i], i});    // log(V) time
    
    while (!s.empty())    // exactly V times
    {
        auto top = *(s.begin());    // constant time
        int dis = top.first;
        int node = top.second;
        s.erase(top);               // amortised constant time

        for (auto it: edge[node])    // For all the iterations of outer while loop this will sum up to E, where E is the total number of edges in the graph
        {
            int nb = it[0];
            int edge_weight = it[1];
            
            if (dist[nb] > dis + edge_weight)
            {
                s.erase({dist[nb], nb});    // log(V) time
                dist[nb] = dis + edge_weight;
                s.insert({dist[nb], nb});    // log(V) time
            }
        }
    }
}

Using priority queue:

// V - total number of vertices
// S - source node
void dijkstra(int V, vector<vector<int>> edge[], int S)
{
    vector<int> dist(V, 1e9);
    dist[S] = 0;
    
    priority_queue<pair<int, int> , vector<pair<int, int>>, greater<pair<int, int>>> pq;
    pq.push({dist[S], S});
    
    while (!pq.empty())    // Can be more than V times, let's call this heap_size times
    {
        int node = pq.top().second;    // O(1) time
        pq.pop();                      // log(heap_size) time
        
        for (int i=0; i<edge[node].size(); i++)    // Can be approximated to (E/V), where E is the total number of edges in the graph
        {
            int nb = edge[node][i][0];
            int edge_weight = edge[node][i][1];

            if (dist[nb] > dist[node] + edge_weight) {
                dist[nb] = dist[node] + edge_weight;
                pq.push({dist[nb], nb});    // log(heap_size) time
            }
        }
    }
    return dist;
}

Finding the time complexity using set based approach is easy as the number of elements in set is exactly V(number of vertices) and the inner for loop runs for every edge, so it's time complexity is O(V*log(V) + V + E*log(V)) which is equivalent to O(E*log(V)) (reason: What's the time complexity of Dijkstra's Algorithm)

But I have trouble in understanding the time complexity of the priority_queue approach. Here the same node can be present multiple times in the priority_queue with different distances. How do I calculate the upper bound on the number of nodes that are added to the heap?

I also want to decide which implementation to use based on the nature of the graph(sparse vs dense), are both these implementations equivalent for any type of graph?


Solution

  • Your priority_queue version isn't quite right.

    Your while loop should start like this:

        auto nextPair = pq.top();
        pq.pop();
        if (dist[nextPair.second] != nextPair.first) {
            continue;
        }
        int node = nextPair.second;
    

    This ensures that each vertex is processed only once, when its current record is popped from the priority queue.

    The complexity analysis then becomes easy, since each edge will then be processed at most once, and there are then at most |E| inserts into the priority queue.

    Total complexity is then O(E log E), and since E < V2, that's the same as O(E log V).

    The major disadvantage of the priority queue method is that it can consume O(E) space. This is usually OK, since it's on par with the space consumed by the graph itself. Since the priority_queue is a lot faster than set, the priority_queue version is the way that it is commonly done in practice.