Search code examples
rggplot2legendgeom-point

How do I add a legend to ggplot with two scatter plots (geom_point() )


I have a table I read into R to plot with ggplots. I need to plot two sets of data points and also use an ifelse condition to be able to show specific genes.

I would like to show a legend to the plot, but I don't know how to get there. I tried scale_color_manual(), but it doesn't show the legend still.

I'm not sure where I'm wrong.

this is my command:

ggplot(data=combined_dataset) +
  geom_point(mapping = aes(x = rf_bjab_prot, y = rf_bjab_trans), 
             alpha=0.7, size = 3, color = ifelse(combined_dataset$name.miapaca %in% genes, "blue1", 'maroon')) +
  geom_point(mapping = aes(x = rf_miapaca_prot, y = rf_miapaca_trans), 
             alpha=0.2, size = 3, color = ifelse(combined_dataset$name.miapaca %in% genes, "#D55E00", 'skyblue')) +
 geom_text_repel(mapping = aes(x = rf_bjab_prot, y = rf_bjab_trans, 
                               label = ifelse(name.miapaca %in% genes, name.miapaca, "")), 
                  size = 3, box.padding = 1, color = c("darkblue") ) + 
  geom_text_repel(mapping = aes(x = rf_miapaca_prot, y = rf_miapaca_trans,
                                label = ifelse(rf_miapaca_prot >= 0.05 & rf_miapaca_trans >= 60, name.miapaca, "")),
                  size = 3, box.padding = 1, color = c("green") ) +
  geom_text_repel(mapping = aes(x = rf_bjab_prot, y = rf_bjab_trans, 
                                label = ifelse(rf_bjab_prot <= -0.5 & rf_bjab_trans <= -10, name.miapaca, '')),
                  size = 3, box.padding = 1, color = c("#009E73") )

whoch produces the following plot:

enter image description here

what do I need to change in order for the legend to be shown?

thanks Assa

P.S. These are the data points I'm trying to plot:

> genes <- c("EGFR", "REL", "IGHM", "CD79B")
> head(combined_dataset) |> select(starts_with("rf"))
  rf_miapaca_prot rf_bjab_prot rf_miapaca_trans rf_bjab_trans
1      0.14152102   0.00000000         33.06388     0.0000000
2      0.18698557   0.00000000         31.59254     0.0000000
3      0.47772063   0.01919795          0.00000     0.0000000
4      0.19858826   0.33263854         59.22759    -0.2085603
5      0.09897121   0.00000000         32.50715     0.0000000
6      0.16365261   0.00000000          0.00000     0.0000000

Also running this code doesn't provide a legend

ggplot(data=combined_dataset) +
  geom_point(mapping = aes(x = rf_bjab_prot, y = rf_bjab_trans), 
             alpha=0.7, size = 3, color = "blue1") +
  geom_point(mapping = aes(x = rf_miapaca_prot, y = rf_miapaca_trans), 
             alpha=0.2, size = 3, color =  "#D55E00")

Solution

  • The best way with ggplot2 is generally to pivot the data so that you can do a single call to geom_point(). To aid in this, I'll generate a helper frame that maps each "type" with the color and alpha you want.

    my_scales <- data.frame(type = c("rf_bjab", "rf_miapaca"), color = c("blue1", "#D55E00"), alpha = c(0.7, 0.2))
    my_scales
    #         type   color alpha
    # 1    rf_bjab   blue1   0.7
    # 2 rf_miapaca #D55E00   0.2
    
    longer <- tidyr::pivot_longer(
      combined_dataset, cols = everything(),
      names_pattern = "(.*)_(prot|trans)", names_to = c("type", ".value"))
    longer
    # # A tibble: 12 × 3
    #    type         prot  trans
    #    <chr>       <dbl>  <dbl>
    #  1 rf_miapaca 0.142  33.1  
    #  2 rf_bjab    0       0    
    #  3 rf_miapaca 0.187  31.6  
    #  4 rf_bjab    0       0    
    #  5 rf_miapaca 0.478   0    
    #  6 rf_bjab    0.0192  0    
    #  7 rf_miapaca 0.199  59.2  
    #  8 rf_bjab    0.333  -0.209
    #  9 rf_miapaca 0.0990 32.5  
    # 10 rf_bjab    0       0    
    # 11 rf_miapaca 0.164   0    
    # 12 rf_bjab    0       0    
    
    ggplot(longer, aes(prot, trans)) +
      geom_point(aes(color = type, alpha = type)) +
      scale_color_manual(values = setNames(my_scales$color, my_scales$type)) +
      scale_alpha_manual(values = setNames(my_scales$alpha, my_scales$type))
    

    ggplot scatterplot, two colors, with a legend labeling the two colors and alphas

    Sometimes pivoting/reshaping is not possible or you just want a quick "hack", for those times you can do something like this:

    ggplot(combined_dataset) +
      geom_point(aes(rf_bjab_prot, rf_bjab_trans, color = "rf_bjab", alpha = "rf_bjab")) +
      geom_point(aes(rf_miapaca_prot, rf_miapaca_trans, color = "rf_miapaca", alpha = "rf_miapaca")) +
      scale_color_manual(values = setNames(my_scales$color, my_scales$type)) + 
      scale_alpha_manual(values = setNames(my_scales$alpha, my_scales$type))
    

    same ggplot scatterplot, now one legend for each of color and alpha

    It's likely possible to combine the two legends into one so that the dots reflect both the color= and alpha= at the same time, but I think in general the first method is preferred and easier to scale to larger data.


    Reshaping/pivoting the data from wide to long seem simple here but can be more difficult if you have other columns (that should not pivot) and aren't familiar with the functions. Here are a bunch of Q/As on SO that talk about the topic with workable code, using tidyr::pivot_*, reshape2::melt, and even base::reshape: Reshaping data.frame from wide to long format, how to use pivot_longer in R? (includes stats::reshape); (names_pattern) Transforming wide data to long format with multiple variables (multiple value columns) pivot_longer into multiple columns, Is there way to pivot_longer to multiple values columns in R? (dual-pivot?) Pivot longer in R