Search code examples
pythonscikit-learncluster-analysisdbscan

Scikit DBSCAN eps and min_sample value determination


I have been trying to implement DBSCAN using scikit and am so far failing to determine the values of epsilon and min_sample which will give me a sizeable number of clusters. I tried finding the average value in the distance matrix and used values on either side of the mean but haven't got a satisfactory number of clusters:

Input:

db=DBSCAN(eps=13.0,min_samples=100).fit(X)
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('Estimated number of clusters: %d' % n_clusters_)

output:

Estimated number of clusters: 1

Input:

db=DBSCAN(eps=27.0,min_samples=100).fit(X)

Output:

Estimated number of clusters: 1

Also so other information:

The average distance between any 2 points in the distance matrix is 16.8354
the min distance is 1.0
the max distance is 258.653

Also the X passed in the code is not the distance matrix but the matrix of feature vectors. So please tell me how do i determine these parameters


Solution

  • Try changing the min_samples parameter to a lower value. This parameter affects the minimum size of each cluster formed. May be, the possible clusters to be formed are all small sized and the parameter you are using right now is too high for them to be formed.