Search code examples
pythonscikit-learndbscanmetric

How to use sklearn's DBSCAN with a spherical metric?


I have a set of data distributed on a sphere and I am trying to understand what metrics must be given to the function DBSCAN distributed by scikit-learn. It cannot be the Euclidean metrics, because the metric the points are distributed with is not Euclidean. Is there, in the sklearn packet, a metric implemented for such cases or is dividing the data in small subsets the easiest (if long and tedious) way to proceed?

P.S. I am a noob at python

P.P.S. In case I "precompute" the metric, in what form do I have to submit my precomputed data? Like this?

0 - event1 - event2 - ...

event1 - 0 - distance(event1,event2) - ...

event2 - distance(event1,event2) - 0

Please, help?


Solution

  • Have you tried metric="precomputed"?

    Then pass the distance matrix to the DBSCAN.fit function instead of the data.

    From the documentation:

    X array [n_samples, n_samples] or [n_samples, n_features] :

    Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as ‘precomputed’.