Search code examples
machine-learningunsupervised-learningdbscan

If a DBSCAN algorithm is working correctly, is it possible to result in a cluster with less than minPoints members?


I am new to using the DBSCAN algorithm.

Quick summary; it has two parameters:

  1. epsilon - to specify the acceptable "distance" between two points, under which they can be considered close enough to cluster.
  2. minPoints - to specify the minimum number of points that must fall with the distance epsilon to constitute a cluster. If there aren't enough points together, it's just labelled as noise.

I'm using somebody else's DBSCAN algorithm and I have the source code, which I only kind of understand. I was hoping I could use it as-is but then I found some behaviour I wasn't expecting.

I specified a value of 6 for minPoints and yet in my results I got a cluster with only 2 points.

From debugging, I think I can see what's happening. It looks like when the point is examined it has 16 neighbours at a distance below epsilon, so it should qualify as a cluster. Later on, the algorithm sees that 14 of those neighbours had already been assigned to a different cluster, so this cluster ends up with only 2 points in it.

Is minPoints supposed to be strictly enforced?

Is this how a healthy DBSCAN algorithm is supposed to work or is it a bug I need to fix before proceeding?


Solution

  • For anybody who finds this question through Google, the correct answer is Yes. It is possible for a correctly functioning DBSCAN implementation to create a cluster with less than minPoints members. It is explained more fully on this other Stack Overflow question:

    Can the DBSCAN algorithm create a cluster with less than minPts?