Search code examples
cluster-analysisdata-miningdbscan

How to compute a knee in k-distance plot?


I want to implement some kind of improvement of DBSCAN algorithm, where user do not need to enter input parameters (minPts and Eps). My idea is to use the K-distances plot, but what is the best method to compute the 'knee' of this plot? How to count when there are 2 or more knees on the plot?

Where I can find the source code for some DBSCAN improvement, like AUTODBSCAN, VDBSCAN, PDBSCAN or DBSCAN-DLP? Im looking for some basics, but nowhere I can find a good help. Maybe you've seen somewhere sample source codes?


Solution

  • DBSCAN has already been improved to death.

    In Google Scholar, it has 5361 citations, and probably 1000+ of these "improve" DBSCAN. And probably a dozen of these use the k-distance plot. But none of these are used in practise.

    If you want to continue this line of research, best get updated on what has been done since. In particular, have a look at OPTICS which does away with the Epsilon parameter completely (except for performance reasons when using indexes).

    Also have a look at HDBSCAN* by one of the original DBSCAN authors, Joerg Sander. That will likely be the most important DBSCAN extension besides his work on OPTICS and GDBSCAN.