Search code examples
rggplot2forcats

how to automate the legend in a ggplot chart?


Consider this simple example

library(dplyr)
library(forcats)
library(ggplot2)

mydata <- data_frame(cat1 = c(1,1,2,2),
           cat2 = c('a','b','a','b'),
           value = c(10,20,-10,-20),
           time = c(1,2,1,2))

mydata <- mydata %>% mutate(cat1 = factor(cat1),
                 cat2 = factor(cat2))

> mydata
# A tibble: 4 x 4
  cat1  cat2  value  time
  <fct> <fct> <dbl> <dbl>
1 1     a      10.0  1.00
2 1     b      20.0  2.00
3 2     a     -10.0  1.00
4 2     b     -20.0  2.00

Now, I want to create a chart where I interact the two factor variables. I know I can use interact in ggplot2 (see below).

My big problem is that I do not know how to automate the labeling (and the colouring) of the interactions so that I can avoid any manual error using scale_colour_manual.

For instance:

ggplot(mydata,
       aes(x = time, y = value, col = interaction(cat1, cat2) )) + 
  geom_point(size=15) + theme(legend.position="bottom")+
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
  theme(legend.position="bottom",
        legend.text=element_text(size=12, face = "bold")) +
  scale_colour_manual(name = ""
                      , values=c("red","red4","royalblue","royalblue4")
                      , labels=c("1-b","1-a"
                                 ,"2-a","2-b"))

shows:

enter image description here

which has the wrong labels because of a (voluntarily) mistake I made in scale_colour_manual(). Indeed, the bright red dot is 1-a and not 1-b (note how the labels are simply the concatenation of the variable names). The idea is that with more factor levels, guessing the right order can be tricky.

Is there a way to automate this labeling (even better: labeling AND coloring)? Perhaps using forcats? Perhaps creating the labels as strings in the dataframe beforehand?

Thanks!


Solution

  • If the number of factor levels for cat1 / cat2 are not fixed (but could potentially be much larger than 2), I would try to calculate the appropriate colours with hsv(), rather than assign them manually.

    The colour cheatsheet here summarise the HSV colour model rather nicely:

    colour wheel

    Hue (h) is essentially your rainbow colour wheel, Saturation (s) determines how intense the colour is, and Value (v) how dark it is. Each parameter accepts values in the range [0, 1].

    Here's how I would adapt it for this use case:

    mydata2 <- mydata %>%
    
      # use "-" instead of the default "." since we are using that for the labels anyway
      mutate(interacted.variable = interaction(cat1, cat2, sep = "-")) %>%
    
      # cat1: assign hue evenly across the whole wheel,
      # cat2: restrict both saturation & value to the [0.3, 1], as it can look too
      #       faint / dark otherwise
      mutate(colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
                          s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
                          v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
    
    # create the vector of colours for scale_colour_manual()
    manual.colour <- mydata2 %>% select(interacted.variable, colour) %>% unique()
    colour.vector <- manual.colour$colour
    names(colour.vector) <- manual.colour$interacted.variable
    rm(manual.colour)
    
    > colour.vector
          1-a       1-b       2-a       2-b 
    "#3AA6A6" "#00FFFF" "#A63A3A" "#FF0000" 
    

    With the colours calculated automatically for any number of factors, plotting becomes quite straightforward:

    ggplot(mydata2,
           aes(x = time, y = value, colour = interacted.variable)) +
      geom_point(size = 15) +
      scale_colour_manual(name = "",
                          values = colour.vector,
                          breaks = names(colour.vector)) +
      theme(legend.position = "bottom")
    

    plot

    An illustration with more factor levels (code is the same except for the addition of specifying guide_legend(byrow = TRUE) in the colour scale:

    mydata3 <- data.frame(
      cat1 = factor(rep(1:3, times = 5)),
      cat2 = rep(LETTERS[1:5], each = 3),
      value = 1:15,
      time = 15:1
    ) %>%
      mutate(interacted.variable = interaction(cat1, cat2, sep = "-"),
             colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
                          s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
                          v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
    
    manual.colour <- mydata3 %>% arrange(cat1, cat2) %>%
      select(interacted.variable, colour) %>% unique()
    colour.vector <- manual.colour$colour
    names(colour.vector) <- manual.colour$interacted.variable
    rm(manual.colour)
    
    ggplot(mydata3,
           aes(x = time, y = value, colour = interacted.variable)) +
      geom_point(size = 15) +
      scale_colour_manual(name = "",
                          values = colour.vector,
                          breaks = names(colour.vector),
                          guide = guide_legend(byrow = TRUE)) +
      theme(legend.position = "bottom")
    

    example