Search code examples
rggplot2plotlegendlegend-properties

Changing legend labels when using multiple variables


In R using ggplot: I'm trying to plot a line plot with multiple columns as individual variables. I'm not using a fill = parameter so I know that's why scale_fill_discrete doesn't work. From what I've seen from other similar questions, it seems like all the other options (scale_colour_discrete, scale_shape_discrete etc) require you to use those parameters in the first step of building the plot. That may be my main issue, but I don't know how to fix it with the three different variables. Right now the legend that shows up shows the three different colors but they are not associated with the right variable.

ggplot(summary_5yr) + 
geom_line(aes(x = Year, y = NY_Med_Inc, group = 1, color ="blue")) +
geom_line(aes(x = Year, y = FL_Med_Inc, group = 1, color = "red")) +
geom_line(aes(x = Year, y = WA_Med_Inc, group = 1, color = "green")) +
labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")

Solution

  • Try this. To get the colors and the legend right you have to make use of scale_color_manual. Using color = "blue" inside aes() will not set the color to "blue". Instead "blue" is simply a kind of label to which you have to assign a color inside scale_color_manual. Also. To get the correct labels you have to set the labels argument.

    A second approach to achieve the desired plot would be to reshape your df into long format via e.g. tidyr::pivot_longer. This way only one geom_line layer is needed and you get the correct labels automatically.

    library(ggplot2)
    library(tidyr)
    library(dplyr)
    
    set.seed(123)
    
    summary_5yr <- data.frame(
      Year = 2010:2020,
      NY_Med_Inc = runif(11, 10000, 50000),
      FL_Med_Inc = runif(11, 10000, 50000),
      WA_Med_Inc = runif(11, 10000, 50000)
    )
    
    ggplot(summary_5yr) + 
      geom_line(aes(x = Year, y = NY_Med_Inc, group = 1, color ="blue")) +
      geom_line(aes(x = Year, y = FL_Med_Inc, group = 1, color = "red")) +
      geom_line(aes(x = Year, y = WA_Med_Inc, group = 1, color = "green")) +
      scale_color_manual(values = c(blue = "blue", red = "red", green = "green"),
                         labels = c(blue = "NY_Med_Inc", red = "FL_Med_Inc", green = "WA_Med_Inc")) +
      labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")
    

    summary_5yr %>% 
        tidyr::pivot_longer(-Year, names_to = "var", values_to = "value") %>% 
        ggplot() + 
        geom_line(aes(x = Year, y = value, group = var, color = var)) +
        scale_color_manual(values = c(NY_Med_Inc = "blue", FL_Med_Inc = "red", WA_Med_Inc = "green")) +
        labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")