Search code examples
scipyhierarchical-clustering

Using scipy hierarchical clustering with objects


I have a list of objects, and a distance metric between objects. Can I use scipy's hierarchical clustering to cluster the objects (fclust1 seems to only accept vectors of floats)?

Alternatively, if this is not possible in scipy, is there other python library in which this can be done?

Example:

 class MyObject(object):

     def __init__(self):
       self.vec1 = [random.choice(range(100)) for i in range(1000)]
       self.vec2 = [random.choice(range(100)) for i in range(1000)]

 def my_distance_metric(a1, a2):

      return some scalar function of a1.vec1, a1.vec2, a2.vec1, a2.vec2

 objects = [MyObject() for in in range(1000)]
 fclust1.cluster(objects, metric = my_distance_metric)

Thanks.


Solution

  • You can compute the condensed distance matrix of your objects and pass it to scipy.cluster.hierarchy.linkage to compute the linkage matrix. Then pass the linkage matrix to, say, scipy.cluster.hierarchy.fcluster or scipy.cluster.hierarchy.dendrogram.

    For example,

    from scipy.cluster.hierarchy import linkage, dendrogram
    
    n = len(objects)
    condensed_dist = [my_distance_metric(objects[j], objects[k])
                          for j in range(n)
                              for k in range(j+1, n)]
    
    Z = linkage(condensed_dist)
    dendrogram(Z)