Search code examples
rlexical-scope

lexical scoping and environments in R


I have following code snippet inspired by my original code:

func2 <- function(foos) {
  for (foo in foos)
    print(eval(parse(text = foo)))
  return(foos)
}

func1 <- function(vec) {
  text3_obj <- 'text3'
  vec <- c(vec, c('text3_obj'))
  return(func2(vec))
}

text1_obj <- 'text1'
text2_obj <- 'text2'
func1(c('text1_obj', 'text2_obj'))

Here in main code I am creating 2 objects (text1_obj & text2_obj) and passing their names to func1(). This function after adding another object in vector, calls func2(). In func2() I am simply printing the values of the objects. Below is the output of this code:

[1] "text1"
[1] "text2"
Error in eval(parse(text = foo)) : object 'text3_obj' not found

After going in debug mode I realized even exists('text3_obj') from inside func2() throws error. so I have 2 questions:

  1. Even though func1() is immediate parent environment of func2(); why text3_obj is not parseable inside func2(). Even higher up global environment variables are parseable e.g. text1_obj.
  2. Why explicitly naming environment in debug mode works like charm e.g. exists('text3_obj', where = parent.frame()) and eval(parse(text = 'text3_obj'), parent.frame())

Solution

  • You are confusing the "calling environment" with the "enclosing environment." Check out these terms in Hadley's book "Advanced R."

    http://adv-r.had.co.nz/Environments.html

    The "calling environment" is the environment from which a function was called, and is returned by the unfortunately-named function parent.frame. However, the calling environment is not used for lexical scoping.

    The "enclosing environment" is the environment in which a function was created and is used for lexical scoping. You have created both func1 and func2 in the global environment. Therefore, the global environment is the "enclosing environment" for both functions and will be used for lexical scoping regardless of the calling environment!!

    If you want func2 to use the execution environment of func1 for lexical scoping, you have (at least) two options. You can create func2 within func1

    func1 <- function(vec) {
    
      func2 <- function(foos) {
        for (foo in foos)
          print(eval(parse(text = foo)))
        return(foos)
      }
    
      text3_obj <- 'text3'
      vec <- c(vec, c('text3_obj'))
      return(func2(vec))
    }
    

    then your test works as expected:

    > text1_obj <- 'text1'
    > text2_obj <- 'text2'
    > func1(c('text1_obj', 'text2_obj'))
    [1] "text1"
    [1] "text2"
    [1] "text3"
    [1] "text1_obj" "text2_obj" "text3_obj"
    

    Alternatively, you can create func2 and reassign it's "enclosing environment" from within func1.

    func2 <- function(foos) {
      for (foo in foos)
        print(eval(parse(text = foo)))
      return(foos)
    }
    
    func1 <- function(vec) {
      text3_obj <- 'text3'
      vec <- c(vec, c('text3_obj'))
      environment(func2) <- environment()
      return(func2(vec))
    }
    

    This will also work as expected.

    An interesting tidbit I found while writing my demonstration code... It appears that when you re-assign the environment of func2 from within func1, R creates a copy of func2 in the execution environment of func1. By the time you get back to the console, the enclosing environment of the original func2 remains unchanged. Witness:

    a = function() {
      print(identical(environment(a), globalenv()))
    }
    
    b = function(x) {
      environment(a) <- environment()
      a()
    }
    

    Test a() and b():

    > a()
    [1] TRUE
    > b()
    [1] FALSE
    > a()
    [1] TRUE
    >
    

    This was not what I expected, but seems like really excellent behavior on the part of R. If this were not the case, the enclosing environment of a() would have been permanently changed to the execution environment of b(), and FALSE should have been returned the second time a() is called.

    If fact, it turns out you can force the change to the original a() in the global environment using <<-:

    a = function() {
      print(identical(environment(a), globalenv()))
    }
    
    b = function(x) {
      # set a variable in the execution environment of b() for use later...
      montePython = "I'm not dead yet!!"
      # change the enclosing environment of a() in the global environment
      # rather than making a local copy of a() in b()'s execution environment.
      environment(a) <<- environment()
      a()
    }
    

    Test a() and b():

    > a()
    [1] TRUE
    > b()
    [1] FALSE
    > a()
    [1] FALSE
    >
    

    Interestingly, this means that the (normally temporary) execution environment of b() persists in memory even after b() terminates, because a() still references the environment, so it can't be garbage collected. Witness:

    > environment(a)$montePython
    [1] "I'm not dead yet!!"