Search code examples
pythonmachine-learningcluster-analysissocial-networkinghierarchical-clustering

Friends-of-friends clustering algorithm (Python)


I would like to implement in Python a "friends-of-friends" algorithm, in which, for a set of points in a N-dimensional space (two-dimensional, in my case), two points are said to be "friends" if they are closer than a given linking length, and a friend of a friend also being a friend (i.e., if A is a friend of B and B of C, A is also a friend of C). Then, all the points that are friends among them are collected into a single cluster, leading to some final number of clusters. The motivation is that I typically have strongly clustered points, with different clusters far away from each other. I would like to allow for an arbitrary metric in the distance calculation (i.e., not necessarily the Euclidean metric).

I could write this down from scratch, but I was wondering whether it can be easily implemented using existing libraries or some smart array-based Python.


Solution

  • What you are looking for is agglomerative clustering (also known as hierarchical clustering) with a single linkage.

    Agglomerative clustering is an algorithm in which you generate a tree of clusters by merging them, starting from single points.

    The linkage parameter defines how you merge those clusters. Single linkage means that in each step you will merge two clusters whose minimum distance between their points is the smallest amongst all cluster pairs. You can then pick a cutoff point and therefore end up with what you are requesting.

    It is implemented in scikit-learn, and you can use many distance metrics or provide a custom distance matrix.

    https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html