I have a set of data distributed on a sphere and I am trying to understand what metrics must be given to the function DBSCAN distributed by scikit-learn. It cannot be the Euclidean metrics, because the metric the points are distributed with is not Euclidean. Is there, in the sklearn packet, a metric implemented for such cases or is dividing the data in small subsets the easiest (if long and tedious) way to proceed?
P.S. I am a noob at python
P.P.S. In case I "precompute" the metric, in what form do I have to submit my precomputed data? Like this?
0 - event1 - event2 - ...
event1 - 0 - distance(event1,event2) - ...
event2 - distance(event1,event2) - 0
Please, help?
Have you tried metric="precomputed"
?
Then pass the distance matrix to the DBSCAN.fit
function instead of the data.
From the documentation:
X
array [n_samples, n_samples] or [n_samples, n_features] :Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as ‘precomputed’.