In this case: a picture is worth a thousand words ..
Hello Purple Cluster!
How did this come about? First - to describe the data and the settings for DBSCAN
:
r0
(x-distance)Everything besides that rogue pair of purple points looks precisely as desired. How did that purple cluster jump clear over the Yellow Wall and claim that rogue pair of points on the top left?
Update It has been verified that there are exactly three clusters. I.e. this is not a bug in choosing three colors to denote four clusters. The verification was directly from the dbscan predicted outputs (not some hypothesis of mine):
Here that is:
NumClusters is 3 counts are (array([-1, 0, 1]), array([ 8, 67, 25]))
Another update To clarify: the 2 purple points are being added to the far right cluster (also in purple). They are not a fourth cluster. So the question is - why are those points being added to the furthest away cluster instead of the nearby green and yellow ones?
This is interesting. I added the cluster number and we see that for the purple it is -1.
That means NO CLUSTER . So that's how the pair in the upper left can "share" the same "cluster" with the ones on the far right: they do not actually share the cluster but just the identifier that sckit-learn
uses for NO cluster.