I want to use the DBSCAN implementation from sklearn. They allow you to use a custom distance metric but only one eps
values.
What I want is the following:
Lets say my points have 3 features each, so each point can be considered as a numpy array of the form p=np.array([p1,p2,p3])
. Two points p
and q
are neighbors if np.abs(p1-q1) < eps1
and np.abs(p2-q2) < eps2
and np.abs(p3-q3) < eps3
. Usually, one would use d(p,q)<eps
, where d(,)
is a metric and eps a threshold.
Is there a way to implement my needs easily into sklearn?
You can scale appropriately, and then use maximum norm.
p = p * [1/eps1, 1/eps2, 1/eps3]
c = sklearn.cluster.DBSCAN(eps=1, metric="chebyshev", ...)
Note that DBSCAN uses <=
not <
.
Or you precompute a binary "distance" matrix, where the distance is 0 if the three conditions hold, and 1 otherwise. But that needs O(n²) memory.