Search code examples
cluster-analysisk-means

Clustering points when not all points are in a cluster


I have a group of coordinates plotted below. I would like to cluster the overlapping points (the ones circled in red) together, however, I would like all the other points that are not overlapping (the points not circled in red) to be ignored. I cannot use K-means clustering since that would cluster all of the points, including the ones I want to be ignored. I was wondering how I might go about this. Thanks

Desired Output:

enter image description here

Input:

enter image description here


Solution

  • There is not just k-means. You are missing 50 years of research if all you consider is k-means.

    For example DBACAN has the concept of noise points that don't belong to any cluster.

    In your case, however, you aren't actually looking for clustering.

    Instead, you want to perform a similarity self-join. Because as far as I can tell. You want to match pairs of points. It a special kind of join. There is no standard syntax for this, but think of it as a SELECT a.p, b.p FROM data AS a JOIN data AS b WHERE distance(a.p, b.p) < threshold.