Search code examples
rggplot2magrittr

How do I use subset in a ggplot pipe?


I am trying to use ggplot with subsets to make individually-styled lines for different values of MyName.

This works if I set up the data frame as a temporary variable temp that gets referred to in the subset function like

temp <- data.frame(x = ..., y = ..., MyName = ...)
temp %>% ggplot(aes(x = x, y= y) + geom_line(data = subset(temp, MyName == "Var Name"), ...)

except that I prefer to avoid creating a temporary data frame.

Is there a syntax that allows me to avoid this? Something like the . in this, except correct:

data.frame(x = ..., y = ..., MyName = ...) %>%
%>% ggplot(aes(x = x, y= y) + geom_line(data = subset(., MyName == "Var Name"), ...)

This says object '.' not found.


Solution

  • You could use lambda syntax as the data argument of a layer. It then knows to use the data supplied to the main ggplot call.

    library(ggplot2)
    library(magrittr)
    
    iris %>% ggplot(aes(Sepal.Width, Sepal.Length)) +
      geom_point(data = ~subset(., Species == "setosa"))
    

    Created on 2021-02-04 by the reprex package (v1.0.0)

    Some extra details for what happens under the hood; ggplot2 uses the fortify() S3 generic on all data arguments in the layer() function. There exists a ggplot2:::fortify.formula() method that calls rlang::as_function(), which replaces the lambda syntax formula with a 'real' function. The ggplot2:::Layer$layer_data() ggproto method then calls that function with the plot data as only argument. Note that this is not the same as how the pipe operator works, but a major convenience nonetheless.