Search code examples
pythonscikit-learncluster-computingdbscan

DBSCAN cluster with metric='russellrao'


I met a problem when I use sklearn.cluster.DBSCAN. If I use DBSCAN(metric="russellrao"), which data format should be? I try 2 ways and both return pred = [-1 -1 -1 ..., -1 -1 -1] . You can see the 2 data format below.

npy = df2.values
y_pred = DBSCAN(metric="russellrao").fit_predict(npy)

1. npy = enter image description here

2. npy = enter image description here

print y_pred [-1 -1 -1 ..., -1 -1 -1]

so,which format is the right anwser?


Solution

  • You need to choose the other DBSCAN parameters appropriately.

    IMHO, sklearn should not have defaults for them. In particular epsilon depends very much on your data set and metric, so the default will almost always be a bad choice. Instead of providing bad defaults, it should force users to choose the parameters.