Search code examples
algorithmmatlabcluster-analysisevaluationdbscan

Which evaluation statistical criteria are proper for DBSCAN algorithm output?


I want to ask advice about the DBSCAN clustering algorithm. I am using it on latitude & longitude matrix data from a seismic catalogue. My question is which evaluation criteria are appropriate to find the correct number of clusters produced by DBSCAN? I am working on Matlab, and I am using the GAP ('elbow') evaluation criterion with k-means, but I read that it may not be appropriate, since k-means does not work well with density based clustering. Also, the Matlab implementation of DBSCAN has two outputs, the type & class. Could someone tell me what is the class output? I think it is assigning data points to respective clusters but I am not sure. Any help would be appreciated, thank you, Dennis


Solution

  • Most validation methods do not work with noise (i.e. DBSCAN).

    You should try

    Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A., & Sander, J. (2014). Density-based clustering validation. In Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA.

    which is the only approach that I am aware of that is designed for density-based clusters. I have not yet tried it though, I prefer manual evaluation.

    Instead of DBSCAN, also try OPTICS, and HDBSCAN*.