Search code examples
algorithmcluster-analysisdbscan

How to implement DBSCAN clustering algorithm?


I'm trying to implement DBSCAN but I can't understand the idea behind it. If it goes through the whole data 1 by 1 and creates a new cluster for close neighbors, then i'll always get a lot of clusters. Say it checks close neighbors of 1st point, finds enough neighbors (>MinPts), creates a cluster for them, moves to the next point, check for neighbors (might also find neighbors that are already in a cluster) and create a new cluster for them. and so on. So some points will be added to more than 1 cluster... Thus a lot of clusters will be created.

Can someone please explain how this algorithm works? I didn't find much information about it online.


Solution

  • No, having created a cluster, you then look at the points in the cluster, and if any of them have sufficient density to have their own cluster, then all the points in that "proto-cluster" are added to the original cluster. In this way it continues agglomerating, until there is a decrease in density such that points are added which lack sufficient neighbouring points to continue the process. When this is done, all the points in that cluster are market "closed" and the the process restarts looking for a new cluster in the remaining "open" nodes.