Search code examples
algorithmcluster-analysis

How to find intersection between two (or more) clusterings of the same dataset


Say that, given a dataset X = (x_1, ..., x_n) with n instances of dimensionality d, I cluster all instances in X using two different clustering algorithms. This will result in two separate clusterings of the same dataset, C' and C''.

Is there a way to find the intersection between those two clusterings? That is, a third clustering C which considers (x_i, x_j) to be in the same cluster iff (x_i, x_j) belong to the same cluster both according to C' and to C''. (And if so, what is its complexity?)

In other words: C(x_i) = C(x_j) iff [C'(x_i) = C'(x_j) and C''(x_i) = C''(x_j)]

Moreover, if such a method exists, does it also extend to the case where the number of clusterings to compare is greater than two?


Solution

  • What you can do is to create a graph where the instances are the nodes and there are edges between the nodes if C'(x_i) = C'(x_j) and C''(x_i) = C''(x_j). This can be done in O(n^2).

    Then you can find the connected components of the graph. This is also O(n^2).

    The connected components of the graph are your final clusters.