I met a problem when I use sklearn.cluster.DBSCAN.
If I use DBSCAN(metric="russellrao")
, which data format should be?
I try 2 ways and both return pred = [-1 -1 -1 ..., -1 -1 -1]
. You can see the 2 data format below.
npy = df2.values
y_pred = DBSCAN(metric="russellrao").fit_predict(npy)
1.
npy =
2.
npy =
print y_pred [-1 -1 -1 ..., -1 -1 -1]
so,which format is the right anwser?
You need to choose the other DBSCAN parameters appropriately.
IMHO, sklearn should not have defaults for them. In particular epsilon depends very much on your data set and metric, so the default will almost always be a bad choice. Instead of providing bad defaults, it should force users to choose the parameters.