Search code examples
rggplot2ggnewscale

Putting line legend in scatterplot with geom_point


I have the following sample scatterplot:

enter image description here

I would like to remove the black dots from the "Project Performance" legend squares. I just want the legend to show the differently colored line types and their labels. Would someone know how to do this? I'm using ggnewscale for the first time to show legends for this scatter plot that uses both geom_point and geom_vline. I don't know how to manipulate the legends within ggnewscale.

Below is my code:

#sample data
df <- data.frame(cohort=c("20-21","20-21","20-21","20-21","21-22","21-22","21-22","21-22","22-23","22-23","22-23","22-23"),
                             sat=c(1220, 1020, 850, 1160, 920, 970, 1170, 830, 730, 1200, 1090, 880),
                             project=c(2.5, 2.2, 2.6, 2.8, 2.9, 3.0,3.0, 2.8, 2.0, 2.5, 1.8, 2.0),
                             pass_sat=c("Met Threshold", "Met Threshold", "Did Not Meet Threshold", "Did Not Meet Threshold",
                                        "Met Threshold", "Did Not Meet Threshold", "Did Not Meet Threshold","Met Threshold",
                                        "Met Threshold", "Did Not Meet Threshold", "Did Not Meet Threshold","Met Threshold"))

threshold <- data.frame(x = c(2, 2.7), group = c("pass", "scholarship eligible"),
                        color = c("red", "darkgreen"), line=c("dashed", "solid"), stringsAsFactors = FALSE) 

group.colors <- c("Overall" ="#FFD580", "ELA" = "#CBC3E3", "Math" = "lightyellow", "Essential Skills"="lightpink")

#sample plot
#total sat against total project score 
ggplot(df, aes(x = project, y = sat)) + 
  geom_point() +
  facet_grid(.~cohort) +
  scale_x_continuous(limits = c(1, 3), breaks = seq(1, 3, by = 0.5)) +
  geom_smooth(method = "lm", se = F) +
  labs(x = "Total Project score", y = "Total SAT") + 
  stat_cor(method = "pearson", size = 2.6, digits= 2) +
  geom_vline(
    aes(xintercept = x, colour = group, linetype = group),
    threshold
  )+ 
  scale_linetype_manual(
    name = "Project Performance",
    values = c(pass = "solid", "scholarship eligible" = "dashed")
  ) +
  scale_color_manual(
    name = "Project Performance",
    values = c(pass = "red", "scholarship eligible" = "darkgreen")
  ) +
  new_scale_color() +
  geom_point(aes(colour = pass_sat),
             show.legend = TRUE) +
  scale_color_manual(name = "SAT Performance", values = c("Met Threshold" = "purple", "Did Not Meet Threshold" = "black")) +
  labs(title= "Total SAT by Overall Project Score, by Year")

Thank you!


Solution

  • Two ways:

    1. You have geom_point twice but I believe the second is what you intended since the first has no aes(.). (I think this means you're double-plotting points ... not visible but still happening.) If you comment out the first geom_point() and remove show.legend=TRUE from the second, it works.

      ggplot(df, aes(x = project, y = sat)) + 
        # geom_point() +                                                    # REMOVE
        facet_grid(.~cohort) +
        scale_x_continuous(limits = c(1, 3), breaks = seq(1, 3, by = 0.5)) +
        geom_smooth(method = "lm", se = F) +
        labs(x = "Total Project score", y = "Total SAT") + 
        ggpubr::stat_cor(method = "pearson", size = 2.6, digits= 2) +
        geom_vline(
          aes(xintercept = x, colour = group, linetype = group),
          threshold
        )+ 
        scale_linetype_manual(
          name = "Project Performance",
          values = c(pass = "solid", "scholarship eligible" = "dashed")
        ) +
        scale_color_manual(
          name = "Project Performance",
          values = c(pass = "red", "scholarship eligible" = "darkgreen")
        ) +
        ggnewscale::new_scale_color() +
        geom_point(aes(colour = pass_sat)) +                                # UPDATE
        scale_color_manual(name = "SAT Performance", values = c("Met Threshold" = "purple", "Did Not Meet Threshold" = "black")) +
        labs(title= "Total SAT by Overall Project Score, by Year")
      

      enter image description here

    2. Add guides(..) and an override.aes before the call to new_scale_color(), while the first color scale is in effect. (If you do it after the new scale then the dots will be removed from the second legend, not what you asked for.)

      ggplot(df, aes(x = project, y = sat)) + 
        geom_point() +
        facet_grid(.~cohort) +
        scale_x_continuous(limits = c(1, 3), breaks = seq(1, 3, by = 0.5)) +
        geom_smooth(method = "lm", se = F) +
        labs(x = "Total Project score", y = "Total SAT") + 
        ggpubr::stat_cor(method = "pearson", size = 2.6, digits= 2) +
        geom_vline(
          aes(xintercept = x, colour = group, linetype = group),
          threshold
        )+ 
        scale_linetype_manual(
          name = "Project Performance",
          values = c(pass = "solid", "scholarship eligible" = "dashed")
        ) +
        scale_color_manual(
          name = "Project Performance",
          values = c(pass = "red", "scholarship eligible" = "darkgreen")
        ) +
        guides(color = guide_legend(override.aes = list(shape = NA))) +     # NEW
        new_scale_color() +
        geom_point(aes(colour = pass_sat),
                   show.legend = TRUE) +
        scale_color_manual(name = "SAT Performance", values = c("Met Threshold" = "purple", "Did Not Meet Threshold" = "black")) +
        labs(title= "Total SAT by Overall Project Score, by Year")
      

      enter image description here