Search code examples
rvisualizationdbscan

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"


I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).

Following is my code.

library(fpc)
library(dbscan)


data("iris")
head(iris,2)


data1 <- iris[,1:4]
head(data1,2)

set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)

table(db$cluster,iris$Species)

plot(db,data1,main = 'DBSCAN')

Error: Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

How to rectify this error?


Solution

  • I have a suggestion below, but first I see two issues:

    1. You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
    2. It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.

    Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:

    # load our packages
    # note: only loading dbscacn, not loading fpc since we're not using it
    library(dbscan)
    library(ggplot2)
    library(dplyr)
    
    # run dbscan::dbscan() on the first four columns of iris
    db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
    
    # create a new data frame by binding the derived clusters to the original data
    # this keeps our input and output in the same dataframe for ease of reference
    data2 <- bind_cols(iris, cluster = factor(db$cluster))
    
    # make a table to confirm it gives the same results as the original code
    table(data2$cluster, data2$Species)
    
    # using ggplot, make a point plot with "jitter" so each point is visible
    # x-axis is species, y-axis is cluster, also coloured according to cluster
    ggplot(data2) +
      geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
                 position = "jitter") +
      labs(title = "DBSCAN")
    

    Here's the image it generates:

    A scatterplot generated using ggplot() showing species of iris clustered by the function dbscan::dbscan()

    If you're looking for something else, please be more specific about what the final plot should look like.