Search code examples
algorithmunion-find

Find the number of connected components in undirected graph using Union Find


I've been staring at this algorithm for a while but couldn't spot what I've missed.

Logic:

  1. Init all isolated components
  2. union frm, to component in each edge and decrement # components if those components aren't already connected.

Question: how come that I get this -48 when running this particular test case?

    def countComponents(self, n, edges):
        """
        :type n: int
        :type edges: List[List[int]]
        :rtype: int
        """
        roots = [i for i in range(n)]  # track disconnected components

        # n isolated islands
        for u, v in edges:

            root1 = self.findRoot(roots, u)
            root2 = self.findRoot(roots, v)

            if root1 != root2:
                roots[root1] = root2
                n -= 1
        #print(roots)
        return n

    def findRoot(self, roots, id):
        oid=id
        if roots[id] != id:
            # roots[id] = roots[roots[id]]
            id=roots[id]

        roots[oid]=id # path compression
        return id

Test case:

n=100
edges=[[6,27],[83,9],[10,95],[48,67],[5,71],[18,72],[7,10],[92,4],[68,84],[6,41],[82,41],[18,54],[0,2],[1,2],[8,65],[47,85],[39,51],[13,78],[77,50],[70,56],[5,61],[26,56],[18,19],[35,49],[79,53],[40,22],[8,19],[60,56],[48,50],[20,70],[35,12],[99,85],[12,75],[2,36],[36,22],[21,15],[98,1],[34,94],[25,41],[65,17],[1,56],[43,96],[74,57],[19,62],[62,78],[50,86],[46,22],[10,13],[47,18],[20,66],[83,66],[51,47],[23,66],[87,42],[25,81],[60,81],[25,93],[35,89],[65,92],[87,39],[12,43],[75,73],[28,96],[47,55],[18,11],[29,58],[78,61],[62,75],[60,77],[13,46],[97,92],[4,64],[91,47],[58,66],[72,74],[28,17],[29,98],[53,66],[37,5],[38,12],[44,98],[24,31],[68,23],[86,52],[79,49],[32,25],[90,18],[16,57],[60,74],[81,73],[26,10],[54,26],[57,58],[46,47],[66,54],[52,25],[62,91],[6,72],[81,72],[50,35],[59,87],[21,3],[4,92],[70,12],[48,4],[9,23],[52,55],[43,59],[49,26],[25,90],[52,0],[55,8],[7,23],[97,41],[0,40],[69,47],[73,68],[10,6],[47,9],[64,24],[95,93],[79,66],[77,21],[80,69],[85,5],[24,48],[74,31],[80,76],[81,27],[71,94],[47,82],[3,24],[66,61],[52,13],[18,38],[1,35],[32,78],[7,58],[26,58],[64,47],[60,6],[62,5],[5,22],[60,54],[49,40],[11,56],[19,85],[65,58],[88,44],[86,58]]

Solution

  • The error lies in the findRoot function. You're currently checking if the node supplied (id) is the parent of itself, i.e., is the representative element of its set. If it isn't, you assume that its parent is the representative, and return it. But that isn't necessarily true, even if you're using path compression.

    Say you have 4 nodes A, B, C, and D. Edges are: A -> B, B -> C, A -> D.

    You first set A's parent to be B, then B's parent to be C. It's fine till here. But when you arrive at processing the last edge, you call findRoot(..., A). It checks that A isn't its own parent, and so returns its parent B. But, the representative element of A's set isn't even B, it's C. You can see that B isn't its own parent as well. The component count doesn't mess up here, but you can see how it can produce wrong results.

    The only change needed is that you have to keep looking for the representative element, and not just return after 1 check.

    Your new method will be along the lines of:

    def findRoot(self, roots, id):
        oid=id
        while roots[id] != id: # Keep looking for the representative
            id=roots[id]
    
        roots[oid]=id
        return id
    

    This change results in counting 6 connected components in your input data. Confirmed with DFS.

    To implement path compression, set the parent of each of the nodes you encountered in this loop to the final value.