Search code examples
rggplot2k-meansscatter-plot

Scatter plot and clusters within it


I created a scatter plot using the ggplot2 package for my data. Since my data has a large number of points, I will explain my problem with already available small dataset. Consider this scatter plot:

ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point()

Scatter plot between wt and mpg

I want to use k-means clustering to cluster these data points, but then also show the clusters on the same scatter plot (the one shown above) and not a new dimensionality reduction plot? How can I do this?


Solution

  • Here is an alternative using factoextra package:

    library(factoextra)
    
    df <- mtcars %>% 
      select(x = wt, y = mpg)
    
    
    # Compute k-means with k = 3
    
    set.seed(123)
    res.km <- kmeans(scale(df[, -5]), 3, nstart = 25)
    
    res.km$cluster
    
    fviz_cluster(res.km, data = df[, -5],
                 palette = c("steelblue", "gold", "limegreen"), 
                 geom = "point",
                 ellipse.type = "convex", 
                 ggtheme = theme_bw()
    )
    

    enter image description here