Search code examples
pythonmachine-learningcluster-analysisdistancedbscan

Finding clusters with difference in value <0.1 in dbscan


Hi I need to cluster points which have values less than or equal to 0.1.My use case goes like this.

0     1649.500000
1        0.864556
2        0.944651
3        0.922754
4        0.829045
5        0.838665
6        1.323263
7        1.397340
8        1.560655
..       .......
27       1.315072
28       1.593657
29       1.222322
...      .......
...      .......
2890     0.151328
2891     0.149963
2892     0.149285
2893     0.146318
2894     0.147668
2895     0.141159

Here I need to cluster the below points. I have given the data as below in dbscan

X = X.reshape(-1,1)
db = DBSCAN(eps=0.1,min_samples=3,metric='manhattan',n_jobs=-1).fit(X)
labels = db.labels_

Now when I print the points which correspond to the points as below

for i in range(n_clusters_):
        print("Cluster {0} include {1}".format(i,list(np.where(labels==i))))

My output is as follows:

Cluster 0 include [array([   1,    2,    3, ..., 2893, 2894, 2895])]

If you can see the above data which I have provided 1st position has 0.8 ... and 2895th position has 0.141...But how can they be clustered when I have given eps =0.1 and metric="manhattan" (which takes absolute difference) . What am I missing here, should I use some other distance.Is my understanding of eps wrong.?What should I do inorder to get it clustered as I wish.


Solution

  • This is exactly how DBSCAN should work.

    DBSCAN is a density based clustering algorithm. Put simply, it starts with a random point p, if there are min_points points in range epsilon around p then it becomes a core point. If two core points are within range epsilon they are put in the same cluster.

    This means: Two points far (e.g., greater than epsilon) away from each other can be connected by other core points in between and thus belong to the same cluster

    The epsilon and min_points parameter you chose seem to result in one big cluster (with the exception of point 0)