Search code examples
cluster-analysisdata-miningdbscan

Analytical way of estimating neighborhood radius for DBSCAN


I have seen many DBSCAN algorithm implemented using a formula to estimate the neighborhood radius (Eps) based on the given minimum points within a cluster (k).

[full code] http://toolz.googlecode.com/svn/trunk/CWT/dbscan.py

% Analytical calculation of rad if not given

function [Eps] = epsilon(x,k) 

[m,n] = size(x);

Eps = ((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);

I have searched extensively to understand how this analytical formula was derived but been unsuccessful.


Solution

  • The estimation of a suboptimal radius is described in the OPTICS paper

    Looking for Natural Patterns in Analytical Data. 2. Tracing Local Density with OPTICS

    As outline in the paper there are assumptions to make this formulation useful.

    To sum up, citing the article, the density of the objects of a dataset can be compared with the density of the same number of objects uniformly distributed in the same volume as the data set. If the data set has a uniform distribution, then the neighborhood radius eps , containing k points can be estimated.