Search code examples
c++algorithmdisjoint-setsunion-find

Do the order of edges matter in union find?


I am learning and to understand it better, I have written a small program:

#include <iostream>
#include <vector>
#include <numeric>
using namespace std;
 
vector<int> parent, sz;
 
int find(int i) {
    if(parent[i]==i) return i;
    return parent[i]=find(parent[i]);
}
 
void merge(int i, int j) {
    int p1=find(i);
    int p2=find(j);
 
    if(p1==p2) return;
    if(sz[p1]<sz[p2]) {
        parent[p1]=p2;
        sz[p2]+=sz[p1];
    } else {
        parent[p2]=p1;
        sz[p1]+=sz[p2];
    }
}
 
int main() {
    vector<vector<int>> allowedSwaps={{0,4},{4,2},{1,3},{1,4}};
 
    int n=5;    //hard-code for now
    sz.resize(n,1);
    parent.resize(n);
    iota(begin(parent),end(parent),0);
 
    cout<<"Parents before: \n";
    for(auto& e: parent)
        cout<<e<<" ";
    cout<<"\n";
 
    for(vector<int>& currswap: allowedSwaps) {
        merge(currswap[0],currswap[1]);
    }
 
    cout<<"Parents after: \n";
    for(auto& e: parent)
        cout<<e<<" ";
    cout<<"\n";
 
    return 0;
}

allowedSwaps essentially denotes all the components that are connected to each other. For the code above, all of them are connected.

However, if you notice the output:

Parents before: 
0 1 2 3 4 
Parents after: 
0 0 0 1 0 

It shows that one of them (3, 0-indexed) is not connected. I think that is because when we process the edge {1,3}, both the vertices 1 and 3 have not been connected to the larger component yet (they are later, by {1,4}) and so Union Find determines that it is a different component.

Does this mean that the order of edges matter for Union find? Or, am I missing something?


Solution

  • The output you got does not signify that one node is disconnected.

    The parent data structure represents links from one node to another (or itself, when it is a root).

    At the start you have this:

    enter image description here

    And at the end we have this:

    enter image description here

    The important thing here is that there is one tree. It is not required that all nodes are linked directly to the root, just that they have a path to the root, which is also true for the node with index 3.

    NB: if you would call find(3), then also index 3 would receive value 0. It is just something that gets optimised by calling find, but it doesn't change the meaning of result.