Search code examples
outliersdbscan

DBSCAN Algorithm outliers


In DBSCAN algorithm, Outliers are often discarded as noise but some applications these noisy data can be more interesting than the more regularly occurring ones. why ?


Solution

  • The points marked as outliers aren't discarded as such, they are just points not in any cluster. You can still inspect the set of non-clustered points and try to interpret them.

    DBSCAN is designed to give clusters without any knowledge of how many clusters there are or what shape they are. It does this by iteratively expanding clusters from starting points in sufficiently dense regions. Outliers are just the points that are in sparsley populated regions (as defined by the eps and minPoints parameters).

    In practice, it takes some care to choose parameters that won't include those outliers. If they are included in clusters they often act as a bridge between clusters and cause them to merge together into an analytically useless blob.