Search code examples
rggplot2plotscatter-plot

Create a scatterplot where color corresponds to a variable with multiple values


I have the following dataframe df, where the variable types corresponds to up to 3 types for each ID (the dataset has approximately 3000 rows):

ID   types  grade  num 
a01  a,b,c   7.1    1 
a02  c,d     7.7    3   
a03  c       7.3    4   
a04  a,c,f   7.9    5   
a05  a,c,e   6.7    3

I want to create a scatterplot, where the x axis corresponds to the num column, the y axis corresponds to the grade and the color of each point corresponds to its type, similar to this: https://i.sstatic.net/vWmVK.png

However, since types has more than one value, I'm struggling to plot it. If types only had one type, I know I could simply do geom_point(aes(colour = types)), but since it can have up to 3, I don't know how to proceed.


Solution

  • I like tidyr::separate_rows which by default will split the column in question into multiple rows for each separate value it detects.

    library(tidyverse)
    df1 %>%
      separate_rows(types) %>%
      ggplot(aes(num, grade, color = types)) +
      # geom_point() + # points are overplotted
      geom_jitter(width = 0.1, height = 0.1)
    

    Or more minimally:

      ggplot(tidyr::separate_rows(df, types), aes(num, grade, color = types)) +
      # geom_point() + # points are overplotted
      geom_jitter(width = 0.1, height = 0.1)
    

    enter image description here