Union find in python3

I know how to implement union find in general, but I was thinking of whether there would be a way to utilize the set structure in python to achieve the same result. For example, we can union sets pretty easily. But I'm not sure how to determine if two elements are in the same set using just sets. So, I am wondering if there is a data structure in python that would support such operation, other than the usual implementation?

Solution

You could always solve this problem by visualizing it as a tree and its nodes connecting to each other via the root, and then looking up the tree if you want to know if two nodes are connected. If the two nodes you are comparing has the same root (they are in the same tree), than they are connected.

To connect two nodes, just go to the root of each tree they are in, and make one root become the parent of the other.

This video will give you a great intuition about it: https://www.youtube.com/watch?v=YIFWCpquoS8&list=PLUX6FBiUa2g4YWs6HkkCpXL6ru02i7y3Q&index=1

The connection between the tree nodes can be made via pointers in a language which supports it, but if your language dont (python), than you can create your own pointers by storing positions and links via an array.

The array would be such that its positions would represent your nodes, and the values inside it represents the connection of the specific node to its root. On the beginning, the position in the array is filled with the node number because the nodes has initially no parent, but as you connect nodes, the roots changes, and the array has to represent this. Actually, the value stored there is the identificator of the root.

But try visualizing the problem visually first instead of thinking of arrays and too much mathematical artificats. Visually dealing with it makes the solution sound banal, and can be a good guidance while writing code.

I say this because I have watched the video from Robert Sedgewick I just posted, with a graphical simulation of the solution, and implemented myself without paying too much attention to the code on his book. The intuition the video gave me is much more valuable than any mathematics.

It will help you to encapsulate the nodes into a class, with the following methods:

climbTreeFromNodeUpToRoot
setNewParentToThisNodeAndUpdateHeights

The first method, as the name says, takes you from a node and goes up the tree until finding the root of it, which is then returned.

If you compare two nodes with this method (actually, the roots returned by it), you know easily if they are connected by just comparing their roots.

Once you want to connected them, you go up the trees of both nodes, and ask one root to take the other one as its parent.

The trees can grow very big in height (sorry I dont use the official nomeclature, but this is the one that makes sense to me), so this simple approach will get very slow when you have to climb the tree at a later time.

To prevent trees from becoming to high, dont just set one root as the parent to another without criterium, but attach the smallest tree (in terms of height, not quantity of elements) to the highest one.

For this, you need to know the heights of each tree, and this information you can store on their respective root (via an extra array in your case, or an extra pointer from each node in other languages). This information should be updated everytime another tree connects to it.

It is not possible for a tree to know that she just got a new tree attached to it, so its important that every tree attaching to a second one informs the second as to update its height.

This information can be sent to the root of the second tree, and later used to judge (as writen before) which tree is the smallest. Remember, attaching a small tree to a big one instead of the opposite will save you incredible amounts of time.