Search code examples
datasetcluster-analysisminimum-spanning-treeunsupervised-learning

Datasets for clustering minimum spanning tree


I have came across the idea of Minimum Spanning Tree recently and found out that it has an application in clustering. I'm looking for a real-world dataset (preferably clean) that can be used as data source for various clustering algorithms. There's an information that MST clustering works good enough on spherical and non-spherical data. This is why non-spherical datasets are sought after as well.

Datasets that I have in mind should contain ground truth info (labels) so the effectiveness of various algos can be measured by something different than WSS.


Solution

  • Minimum spanning tree clustering is standard and well studied.

    It's just called differently.

    Single-link hierarchical clustering is exactly the minimum spanning tree, and the fast SLINK algorithm is closely related to Prim's.

    The weaknesses are also well understood. And you can use almost any data set. For example the common Iris data set.