Search code examples
rggplot2nsestandard-evaluationnon-standard-evaluation

Scoping of variables in aes(...) inside a function in ggplot


Consider this use of ggplot(...) inside a function.

x  <- seq(1,10,by=0.1)
df <- data.frame(x,y1=x, y2=cos(2*x)/(1+x))

library(ggplot2)
gg.fun <- function(){
  i=2
  plot(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
}

if(exists("i")) remove(i)
gg.fun()
# Error in `[.data.frame`(df, , i) : object 'i' not found
i=3
gg.fun()   # plots df[,3] vs. x

It looks like ggplot does not recognize the variable i defined inside the function, but does recognize i if it is defined in the global environment. Why is that?

Note that this gives the expected result.

gg.new <- function(){
  i=2
  plot(ggplot(data.frame(x=df$x,y=df[,i]),aes(x,y)) + geom_line())
}
if(exists("i")) remove(i)
gg.new()   # plots df[,2] vs. x
i=3
gg.new()   # also plots df[,2] vs. x

Solution

  • Let's return a non-rendered ggplot object to see what's going on:

    gg.str <- function() {
         i=2
         str(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
    }
    
    gg.str()
    List of 9
     $ data       :'data.frame':    91 obs. of  3 variables:
      ..$ x : num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
      ..$ y1: num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
      ..$ y2: num [1:91] -0.208 -0.28 -0.335 -0.373 -0.393 ...
     $ layers     :List of 1
      ..$ :Classes 'proto', 'environment' <environment: 0x0000000009886ca0> 
     $ scales     :Reference class 'Scales' [package "ggplot2"] with 1 fields
      ..$ scales: list()
      ..and 21 methods, of which 9 are possibly relevant:
      ..  add, clone, find, get_scales, has_scale, initialize, input, n, non_position_scales
     $ mapping    :List of 2
      ..$ x: symbol x
      ..$ y: language df[, i]
     $ theme      : list()
     $ coordinates:List of 1
      ..$ limits:List of 2
      .. ..$ x: NULL
      .. ..$ y: NULL
      ..- attr(*, "class")= chr [1:2] "cartesian" "coord"
     $ facet      :List of 1
      ..$ shrink: logi TRUE
      ..- attr(*, "class")= chr [1:2] "null" "facet"
     $ plot_env   :<environment: R_GlobalEnv> 
     $ labels     :List of 2
      ..$ x: chr "x"
      ..$ y: chr "df[, i]"
     - attr(*, "class")= chr [1:2] "gg" "ggplot"
    

    As we can see, mapping for y is simply an unevaluated expression. Now, when we ask to do the actual plotting, the expression is evaluated within plot_env, which is global. I do not know why it is done so; I believe there are reasons for that.

    Here's a demo that can override this behaviour:

    gg.envir <- function(envir=environment()) {
        i=2
        p <- ggplot(df,aes(x=x,y=df[,i]))+geom_line()
        p$plot_env <- envir
        plot(p)
    }
    # evaluation in local environment; ok
    gg.envir() 
    # evaluation in global environment (same as default); fails if no i
    gg.envir(environment())