Search code examples
rggplot2

A colour scale defined by some properties of the data (R,ggplot)


I'm trying to use a continuous colour scale, tweaked based on some property of the data. In the following example, let's say I want to highlight the points for which cc sits on or near the first quartile:

dd <- tibble(aa=runif(50),bb=runif(50),cc=runif(50))

ggplot(dd)+
  geom_point(mapping=aes(x=aa,y=bb,colour=cc))+
  scale_colour_gradientn(colours=c("blue","red","blue"),
                         values = scales::rescale(c(0,quantile(dd$cc,0.25),1) ) )

enter image description here

This works as expected, but I need to explicitely call dd$cc in the call to scale_colour_gradientn. This is normally not a problem, but now I'm trying to put that into a pipeline, like so:

dd <- tibble(aa=runif(50),bb=runif(50),cc=runif(50))

dd %>% mutate(ee=aa/cc) %>%
ggplot()+
  geom_point(mapping=aes(x=aa,y=bb,colour=ee))+
  scale_colour_gradientn(colours=c("blue","red","blue"),
                         values = scales::rescale(c(0,quantile(dd$ee,0.25),1) ) )

Of course, this does not work. At the point where I call scale_colour_gradientn, dd has no column called ee, so dd$ee is meaningless. Also, I think (but I'm not too good with the intricacies of data-masking and such) that since scale_colour_grandientn takes no data argument, there is no way it can know what happened upstream in the pipeline.

And of course, in this example, it is easy to create an intermediate variable with e.g.

dd2 <- dd %>% mutate(ee=aa/cc)

ggplot(dd2)+
  geom_point(mapping=aes(x=aa,y=bb,colour=ee))+
  scale_colour_gradientn(colours=c("blue","red","blue"),
                         values = scales::rescale(c(0,quantile(dd2$ee,0.25),1) ) )

But, for the sake of the argument, let's say I want everything to run in one go (ctrl+ENTER in RStudio), and/or I don't want to add intermediate variables to the workspace.

Is there a way to write something like my second code:

dd %>% mutate(ee=aa/cc) %>%
ggplot()+
  geom_point(mapping=aes(x=aa,y=bb,colour=ee))+
  scale_colour_gradientn(colours=c("blue","red","blue"),
                         values = scales::rescale(c(0,
                                    quantile(SOME MAGIC HERE TO GET ee,0.25),
                                    1) ) )

Thanks !


Solution

  • If you surround your plot with curly braces you can access the data you are piping in anywhere within the plot code like:

    library(tidyverse)
    
    dd <- tibble(aa=runif(50),bb=runif(50),cc=runif(50))
    
    dd %>% mutate(ee=aa/cc) %>%
    {ggplot(.)+
      geom_point(mapping=aes(x=aa,y=bb,colour=ee))+
      scale_colour_gradientn(colours=c("blue","red","blue"),
                             values = scales::rescale(c(0,
                                        quantile(.$ee,0.25),
                                        1) ) )}
    

    Created on 2024-07-14 with reprex v2.1.1

    You can check out the magrittr documentation on lambda functions for more information on how this works.