Search code examples
algorithmgraphtreetime-complexityminimum-spanning-tree

Time complexity of Prim's MST algorithm problem


I have a graph problem where we have:

  1. There are exactly n vertices and n-1 edges.
  2. All vertices are connected with each other – i.e. the network consists of just one connected component. The network is thus a tree.
  3. All edges have positive length (strictly greater than 0). All edges can carry traffic in both directions.
  4. I am given the shortest path distance between each pair of vertices.

More formally: Let the actual vertice network be a tree T. Given just the shortest path distances of T, you have to reconstruct the original network T.

Input: An n × n distance matrix H with Hi,j = δT (i,j), where T is the actual network of vertices and δT is the shortest path distance between vertice i and j in T.

Output: The n −1 edges of T.

Example:

•T is the actual vertice network.

•H is the n × n shortest path distance matrix.

•G(H) is the complete graph on n nodes, where edge (i,j) has weight Hi,j – i.e. the shortest path distance in T.

My question about Time Complexity:

What is the running time of the algorithm resulting from running the Prim algorithm on the input and returning the list of edges as a function of n? (Note that |E(G(H))|= Θ(n2)). Should Amortized analysis be used here? Im not really sure.


Solution

  • The time complexity of Prim's algorithm using the adjacency matrix of a complete graph is Theta(n^2). We can see this from the pseudocode of Prim's algorithm with our adjacency matrix H:

    1. Initialize a set Q of vertices not in the tree, initially all vertices. Choose the first vertex to be our root R.
    2. Initialize two arrays of length n, key and parent. key will store, at position i, the minimum weight edge connecting the ith vertex to the current MST; initially, this is +infinity. parent will store, after the algorithm is done, the parent of each vertex in the MST rooted at R. Initially, parent[i] = R for all i, except the root, which has no parent.
    3. Loop over the first row of H (corresponding to the root R) and assign key[i] = H[0][i]. Remove R from our set Q.
    4. While Q is not empty:
      • Loop over Q, and extract any vertex u with minimum key[u]
      • For each vertex v from 0 to n-1:
        • If v in Q and H[u][v] < key[v]:
          • Set key[v] = H[u][v]
          • Set parent[v] = u

    Here, the loop in (4) runs n-1 times, and inside the loop, we do Theta(n) work. In total, that gives a runtime of Theta(n^2), which is optimal for any algorithm that needs to read the entire adjacency matrix. In particular, for a generic complete graph, this is optimal, but this doesn't imply that Prim's algorithm is optimal for your specific case with a narrower class of graphs formed from distance matrices.


    To show that your problem transformation is correct, we need to verify that, given a tree T with positive weights, the complete graph G(H) formed by taking the distance matrix of T as an adjacency matrix will satisfy:

    1. G(H) has a unique minimum spanning tree
    2. T is a minimum spanning tree of G(H).

    This requires proving several properties of minimum spanning trees in general. One theorem about minimum spanning trees, proven as Corollary 3.5 in these MIT lecture notes, says that:

    Let G = (V,E,w) be a connected, weighted, undirected graph. Let T be any MST and let (U, V \ U) be any cut. Then T contains a light edge for the cut.

    Here, a 'light edge for a cut (U, V \ U)' means an edge whose weight is the minimum weight of all edges with exactly one endpoint in U.

    Now, we just need to choose appropriate cuts to prove what we want. For an arbitrary edge e in your original tree T, consider the two trees T1 and T2 we get by deleting e.

    Take the vertices of T1, which we'll call V(T1), as our cut. We need to show that in the complete graph G(H), the edge e is the unique light edge for that cut. In our original tree T, e is the only edge that crosses the cut. This means that any path with one endpoint u in V(T1) and the other endpoint v in V(T2) must include e.

    Since all the weights are positive, this means that the distance in T, distance(u,v) > weight(e), for any u, v such that (u in V(T1), v in V(T2), and (u, v) != e). Since the distance in T between u and v is the weight of the edge (u, v) in our complete graph G(H), this means that e is the unique minimum weight edge that crosses the cut. Since e was an arbitrary edge in T, this now means that all edges of T must be in our MST for G(H), so the unique MST of G(H) is T.