Search code examples
rlazy-loading

lazy evaluation in R


I have the following code in R:

named_list = list()
for (i in 1:5){
named_list[[i]] = function(one,two){c(one,two, i)}
}

However, when I call the function:

> named_list[[1]]("first", "second")
[1] "first"  "second" "5"

Is there a way to get this to work properly (to return "first", "second", "1") without using the apply functions? I have tried to use the force function as recommended in another thread, but I cannot get it to work.

Thanks.

Edit: For some clarification, I am looking to make a list of functions, each of which encloses the index of where that function is in that list. In particular, observe that

> named_list[[1]]("first", "second")
[1] "first"  "second" "5"

> named_list[[2]]("first", "second")
[1] "first"  "second" "5"

> named_list[[3]]("first", "second")
[1] "first"  "second" "5"

> named_list[[4]]("first", "second")
[1] "first"  "second" "5"

> named_list[[5]]("first", "second")
[1] "first"  "second" "5"

which is obviously not the desired behaviour. The problem is that looping i through 1 to 5, R sees the first 'i' indexing the named_list, but doesn't see the second 'i' which is inside the function I am trying to define.

I am aware that the following is a possible solution (although I do not know why it works):

named_list = lapply(1:5, function(i) function(one,two)(c(one,two,i)))

but I want to know if there is an alternative solution that uses the for loop.


Solution

  • I think your problem is related to scope or namespace. Namely, when in a function and a variable is referenced that has not been defined locally in that function, R starts searching in the parent "frame" (environment where its variables are defined); if not there, then it goes to the parent's parent frame (grand-parent frame?); etc. (One good read for this is Advanced R: Environments; an extra read might be the same book's chapter on Memory.)

    It's helpful to look at the environment being used/searched at any given time. I'll focus on the current, parent, and when inside the function, the "grand-parent" environments; realize, though, that deeply nested functions may have many more (which suggests you need to be very careful when depending on R to hunt-down and find the specific instance of a variable not in the local environment!).

    NB: you will very likely not get the same <environment: 0x000...> pointers. These references are completely unreproducible and change each time this code is run.


    Let's start with the lapply setup that works:

    print(environment())
    # <environment: R_GlobalEnv>
    nl1 <- lapply(1:2, function(i) {
      e1 <- environment()
      str(list(where="inside lapply", env=e1, parent=parent.env(e1)))
      function(one,two) {
        e2 <- environment()
        str(list(where="inside func", env=e2, parent=parent.env(e2),
                 grandparent=parent.env(parent.env(e2))))
        c(one, two, i)
      }
    })
    # List of 3
    #  $ where : chr "inside lapply"
    #  $ env   :<environment: 0x0000000009128fe0> 
    #  $ parent:<environment: R_GlobalEnv> 
    # List of 3
    #  $ where : chr "inside lapply"
    #  $ env   :<environment: 0x00000000090bb578> 
    #  $ parent:<environment: R_GlobalEnv> 
    

    First notice that with each iteration within lapply, there is a new environment, starting with 9128fe0, whose parent is the global env. Within the second iteration of the lapply, we are in 90bb578, and within that environment, we define the function(one,two) whose local environment is 8f811b8 (which we see in the next code block).

    Realize that at this time, R has not attempted to resolve i. Let's run a function:

    nl1[[2]](11,12)
    # List of 4
    #  $ where      : chr "inside func"
    #  $ env        :<environment: 0x0000000008f811b8> 
    #  $ parent     :<environment: 0x00000000090bb578> 
    #  $ grandparent:<environment: R_GlobalEnv> 
    # [1] 11 12  2
    

    So when we reference i, R searches in the following, in order, to find it:

    • 8f811b8: inside function(one,two)..., not found
    • 90bb578: immediate parent env, inside function(i) ...; found
    • R_GlobalEnv (not searched, since it was found previously)

    Okay, let's try the for loop:

    nl2 <- list()
    for (i in 1:2) {
      e1 <- environment()
      str(list(where="inside for", env=e1, parent=parent.env(e1)))
      nl2[[i]] <- function(one,two) {
        e2 <- environment()
        str(list(where="inside func", env=e2, parent=parent.env(e2),
                 grandparent=parent.env(parent.env(e2))))
        c(one, two, i)
      }
    }
    # List of 3
    #  $ where : chr "inside for"
    #  $ env   :<environment: R_GlobalEnv> 
    #  $ parent:<environment: package:tcltk> 
    #   ..- attr(*, "name")= chr "package:tcltk"
    #   ..- attr(*, "path")= chr "c:/R/R-3.3.3/library/tcltk"
    # List of 3
    #  $ where : chr "inside for"
    #  $ env   :<environment: R_GlobalEnv> 
    #  $ parent:<environment: package:tcltk> 
    #   ..- attr(*, "name")= chr "package:tcltk"
    #   ..- attr(*, "path")= chr "c:/R/R-3.3.3/library/tcltk"
    

    First thing to notice is that within each iteration of the for loop, the local environment is R_GlobalEnv, which should make sense. (You can safely ignore the reference to the tcltk environment as the parent.)

    Okay, now when we get to the nl2[[1]] call, notice that the parent environment is (perhaps now, not surprisingly) the R_GlobalEnv environment:

    nl2[[1]](11,12)
    # List of 4
    #  $ where      : chr "inside func"
    #  $ env        :<environment: 0x000000001b1a6720> 
    #  $ parent     :<environment: R_GlobalEnv> 
    #  $ grandparent:<environment: package:tcltk> 
    #   ..- attr(*, "name")= chr "package:tcltk"
    #   ..- attr(*, "path")= chr "c:/R/R-3.3.3/library/tcltk"
    # [1] 11 12  2
    

    This was the first time that R needed to find i, so it first searched within 1b1a6720 (within function(one,two), where it was not found), and then in the R_GlobalEnv.

    So why did it return "2"?

    Because the value of i in R_GlobalEnv is, at the time we called nl2[[2]], the last value of i in the for loop. See this:

    rm(i)
    for (i in 1:100) { } # no-op
    i
    # [1] 100
    

    What's even more telling is if we try to call the function now:

    nl2[[1]](11,12)
    # List of 4
    #  $ where      : chr "inside func"
    #  $ env        :<environment: 0x000000000712c2a0> 
    #  $ parent     :<environment: R_GlobalEnv> 
    #  $ grandparent:<environment: package:tcltk> 
    #   ..- attr(*, "name")= chr "package:tcltk"
    #   ..- attr(*, "path")= chr "c:/R/R-3.3.3/library/tcltk"
    # [1]  11  12 100
    

    So the evaluation of i within that function is lazy in that it searches when you call the function.

    In your environment (before you change any code), if you typed in i <- 100, you would see similar behavior.


    If you are absolutely against using lapply (which is my preferred method here, even if I don't understand your underlying need here), try explicitly defining the environment that surrounds your function. One way is to use local, which will preserve searching within existing parent environments while allowing us to "force" which i we want used. (Other options exist, I invite others to comment and for you to explore environments more.)

    nl3 <- list()
    for (i in 1:2) {
      e1 <- environment()
      str(list(where="inside for", env=e1, parent=parent.env(e1)))
      nl3[[i]] <- local({
        i <- i # forces it locally within this env
        function(one,two) {
          e2 <- environment()
          str(list(where="inside func", env=e2, parent=parent.env(e2),
                   grandparent=parent.env(parent.env(e2))))
          c(one, two, i)
        }
      })
    }
    # List of 3
    #  $ where : chr "inside for"
    #  $ env   :<environment: R_GlobalEnv> 
    #  $ parent:<environment: package:tcltk> 
    #   ..- attr(*, "name")= chr "package:tcltk"
    #   ..- attr(*, "path")= chr "c:/R/R-3.3.3/library/tcltk"
    # List of 3
    #  $ where : chr "inside for"
    #  $ env   :<environment: R_GlobalEnv> 
    #  $ parent:<environment: package:tcltk> 
    #   ..- attr(*, "name")= chr "package:tcltk"
    #   ..- attr(*, "path")= chr "c:/R/R-3.3.3/library/tcltk"
    nl3[[1]](11,12)
    # List of 4
    #  $ where      : chr "inside func"
    #  $ env        :<environment: 0x0000000019ca23e0> 
    #  $ parent     :<environment: 0x000000001aabe388> 
    #  $ grandparent:<environment: R_GlobalEnv> 
    # [1] 11 12  1
    i <- 1000
    nl3[[1]](11,12)
    # List of 4
    #  $ where      : chr "inside func"
    #  $ env        :<environment: 0x0000000008d0bc78> 
    #  $ parent     :<environment: 0x000000001aabe388> 
    #  $ grandparent:<environment: R_GlobalEnv> 
    # [1] 11 12  1
    

    (You may notice that the local environment when you call the function changes each time while the parent does not. This is because when you call a function, it starts at the beginning of the function's call with a new environment. You "know" and rely on this because you assume that at the beginning of your function, no variables are defined. This is normal.)