Search code examples
algorithmtreedisjoint-setsunion-find

How do path compression and union by rank complement each other?


I have been reading about union-find problem. The two main improvements are path compression and union by rank. As far as I understand union by rank is used to determine how to combine disjoint trees. If we have two disjoint trees T1 and T2, then we attach the root of the tree with smaller rank to the tree with higher rank. If we don't use path compression then rank is just the depth of a tree. This makes sense since we don't want to increase the depth of out tree,since it directly affects both finds and unions.

My problem is when we use path compression as well. I keep reading that the two optimizations complement each other, but I don't see that. Because of path compression, rank is no longer depth of the tree (it becomes an upper bound on depth). Suppose T1 has 2 branches (let rank of T1 be 3), and T2 has depth 2 and rank 2. Now suppose we perform find operation (with path compression) on the leaf of T1 marked with "*" below. Now if we union root of T1 and root of T2, then T2 would be attached to the root of T1(since rank is not updated by find). The resulting tree has depth 3. But we could have had a better performance if we attached T1 to T2.

T1:   o   (Rank = 3)    T2:   o  (Rank = 2)
     / \                      |
    o   o                     o
    |                         |
    o                         o
    |   
    *   

After a find on leaf of T1("*"), then union on roots of T1 and T2 we get

T1:    o       (Rank = 3)      T2: o  (Rank = 2)       
     /| |\                         |
    * o o o                        o
                                   |
                                   o
Result of T1 union T2
       o
   / | | |\
  * o o o  o   Rank = 3 and Max Depth = 3
           |
           o
           |
           o

Am i missing something here? How do path compression and union by rank complement each other? I know rank is just an upper bound on depth of a tree but I don't see how union by rank is improving the overall performance of the structure. How is this better than a union that combines the roots randomly?

Thanks for the help in advance.


Solution

  • Union by rank ensures that the maximum depth of the tree is log N, so it puts a worst case upper bound of O(log N) on every operation.

    Path compression without any special union rules an upper bound of O(log N) on the amortized cost of each operation, but doesn't limit the worst case cost. (There might even be a tighter bound on the amortized cost, but O(log N) is the one I know how to prove)

    Using both together, you get the O(log N) limit on the worst case and the amortized bound improves to O(𝛼(N)), which is effectively constant. In this way the two optimizations are complementary.

    You are right that there are sequences of operations for which union by rank is not absolutely optimal, but the guarantees are better than what you get without it, and that is what counts. We don't usually spend effort optimizing the best case performance. We optimize the worst or average cases.