Search code examples
rplotcluster-analysisdbscan

R: How to add the noise cluster into DBSCAN plot


I'm trying to plot DBSCAN results. This is what I have done so far. My distance matrix is here.

dbs55_CR_EUCL = dbscan(writeCRToMatrix,eps=0.006, MinPts = 4, method = "dist")

plot(writeCRToMatrix[dbs55_CR_EUCL$cluster>0,], 
     col=dbs55_CR_EUCL$cluster[dbs55_CR_EUCL$cluster>0],
     main="DBSCAN Clustering K = 4 \n (EPS=0.006, MinPts=4) without noise",
     pch = 20)

This is the plot: enter image description here

When I tried plotting all the clusters including the noise cluster I could only see 2 points in my plot. enter image description here

What I'm looking for are

  1. To add the points in the noise cluster to the plot but with a different symbol. Something similar to the following picture

enter image description here

  1. Shade the cluster areas like in the following picture

enter image description here


Solution

  • Noise clusters have an id of 0. R plots usually ignore a color of 0 so if you want to show the noise points (as black) then you need to do the following:

    plot(writeCRToMatrix, 
      col=dbs55_CR_EUCL$cluster+1L,
      main="DBSCAN Clustering K = 4 \n (EPS=0.006, MinPts=4) with noise",
      pch = 20)
    

    If you want a different symbol for noise then you could do the following (adapted from the man page):

    library(dbscan)
    n <- 100
    x <- cbind(
         x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
         y = runif(10, 0, 10) + rnorm(n, sd = 0.2)
    )
    
    res <- dbscan::dbscan(x, eps = .2, minPts = 4)
    plot(x, col=res$cluster, pch = 20)
    points(x[res$cluster == 0L], col = "grey", pch = "+")
    

    Here is code that will create a shaded convex hull for each cluster

    library(ggplot2)
    library(data.table)
    library(dbscan)
    
    
    dt <- data.table(x, level=as.factor(res$cluster), key = "level")
    hulls <- dt[, .SD[chull(x, y)], by = level]
    
    ### get rid of hull for noise
    hulls <- hulls[level != "0",]
    
    cols <- c("0" = "grey", "1" = "red", "2" = "blue")
    
    ggplot(dt, aes(x=x, y=y, color=level)) +
      geom_point() +
      geom_polygon(data = hulls, aes(fill = level, group = level),
        alpha = 0.2, color = NA) +
      scale_color_manual(values = cols) +
      scale_fill_manual(values = cols)
    

    Hope this helps.