Search code examples
parameterscluster-analysisdbscan

Does minpts=4 is the best setting for any dataset using DBSCAN algorithm for clustering?


The article on DBSCAN "https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf" explains that the minpts value must be 4 for any datasets on which the DBSCAN is being used for clustering the data points. Does it gives the best results for any Eps value??


Solution

  • In later work, the authors suggest to use minPts = 2 * dim as default.

    J. Sander, M. Ester, H.-P. Kriegel, and X. Xu. 1998.
    Density-Based Clustering in Spatial Databases:
    The Algorithm GDBSCAN and its Applications.

    Data Mining and Knowledge Discovery 2, 2 (1998), 169–194.
    http://dx.doi.org/10.1023/A:1009745219419

    If you have duplicates, use a larger value: "Our experiments indicate that this value works well for databases D where each point occurs only once, i.e., if D is really a set of points."

    Smaller values are usually more computationally efficient. Thus, keep minPts small but not too small.

    Always study your result. Never use it without double checking.