I am new to using the DBSCAN algorithm.
Quick summary; it has two parameters:
epsilon
- to specify the acceptable "distance" between two points, under which they can be considered close enough to cluster.minPoints
- to specify the minimum number of points that must fall with the distance epsilon
to constitute a cluster. If there aren't enough points together, it's just labelled as noise.I'm using somebody else's DBSCAN algorithm and I have the source code, which I only kind of understand. I was hoping I could use it as-is but then I found some behaviour I wasn't expecting.
I specified a value of 6 for minPoints
and yet in my results I got a cluster with only 2 points.
From debugging, I think I can see what's happening. It looks like when the point is examined it has 16 neighbours at a distance below epsilon
, so it should qualify as a cluster. Later on, the algorithm sees that 14 of those neighbours had already been assigned to a different cluster, so this cluster ends up with only 2 points in it.
Is minPoints
supposed to be strictly enforced?
Is this how a healthy DBSCAN algorithm is supposed to work or is it a bug I need to fix before proceeding?
For anybody who finds this question through Google, the correct answer is Yes. It is possible for a correctly functioning DBSCAN implementation to create a cluster with less than minPoints members. It is explained more fully on this other Stack Overflow question:
Can the DBSCAN algorithm create a cluster with less than minPts?