Search code examples
rloopsfor-loopwhile-looplazy-evaluation

R - The for loop may have bug while the acc it work with is a function. But why?


The issue

There is a case such as this,

acc = \ (a) \ (b) base::list(a = a, b = b)
for (x in c("A","B")) {acc = acc(x)}
acc

It returns:

$a
[1] "B"

$b
[1] "B"

But it should same with,

acc = \ (a) \ (b) base::list(a = a, b = b)
# for (x in c("A","B")) {acc = acc(x)}
acc = acc("A")
acc = acc("B")
acc

Means it should returns:

$a
[1] "A"

$b
[1] "B"

More cases

acc = \ (x) \ (y) \ (z) base::list(x = x, y = y, z = z)
for (i in c("A","B","C")) {acc = acc(i)}
acc

It returns:

$x
[1] "C"

$y
[1] "C"

$z
[1] "C"

But is should same with

acc = \ (x) \ (y) \ (z) base::list(x = x, y = y, z = z)
# for (i in c("A","B","C")) {acc = acc(i)}
acc = acc("A")
acc = acc("B")
acc = acc("C")
acc

Means it should returns:

$x
[1] "A"

$y
[1] "B"

$z
[1] "C"

And same issue with while loop ...

acc = \ (a) \ (b) base::list(a = a, b = b)

.iter = c("A","B")
.i = 0
while (.i < length(.iter)) {acc <<- acc <- acc(.iter[.i + 1]); .i = .i + 1}

print(acc)

So ...

Which reason in R language made that issue ?

I tested this in both R-4.3.3 and webr REPL. A simple demo can be found at here.


Find a way to fix this problem (inspired by base::Reduce and purrr:::reduce_impl):

acc = \ (a) \ (b) base::list(a = a, b = b)
for (x in c("A","B")) {acc <- base::forceAndCall(2, \ (f, ...) f(...), acc, x)}
acc

That will be okay.

Help doc for base::forceAndCall:

Call a function with a specified number of leading arguments forced before the call if the function is a closure.

...

forceAndCall is intended to help defining higher order functions like apply to behave more reasonably when the result returned by the function applied is a closure that captured its arguments.

(But that is not an explanation of the why, so I just put them at here ...)


Solution

  • As Roland says in the comments, this is down to lazy evaluation.

    As first defined, acc is a function factory, i.e. a function which itself returns a function. The returned inner function is a closure, which has access to the variables in the outer function that created it.

    It's a little clearer if we write the function out in full:

    acc <- function (a) {
               return(function (b) {
                        list(a = a, b = b)
                      })
           }
    

    This set up works as expected if we call the function directly:

    acc <- acc("A")
    acc("B")
    #> $a
    #> [1] "A"
    #> 
    #> $b
    #> [1] "B"
    

    However, inside the loop, acc is not called directly with the values "A" then "B". Instead, it is being called with the iterator variable i. R's lazy evaluation means that i will not be evaluated until it is used. The inner function will have access to the variable a, but unlike the case where the functions are called directly, a is not bound directly to the value "A". Rather, it is bound to an unevaluated promise comprising the symbol i and the parent environment in which i should be evaluated (in this case, the global environment).

    On the second iteration of the loop, the inner function needs to evaluate the symbol a, so looks it up and finds it is bound to a promise. However, by this time, the symbol i is no longer bound to the value "A" in the global environment, but rather the value "B". This means that when it is actually used, a evaluates to "B", not "A" as you might expect:

    acc <- function (a) {
      return(function (b) {
        list(a = a, b = b)
      })
    }
    
    for(i in c("A", "B")) {
      acc <- acc(i)
    }
    
    acc
    #> $a
    #> [1] "B"
    #> 
    #> $b
    #> [1] "B"
    

    The solution is to force evaluation of the variable a while it is still bound to the value you intend. This could be as simple as just putting the symbol a inside the function body so that R needs to resolve its value:

    acc <- function (a) {
      a
      return(function (b) {
        list(a = a, b = b)
      })
    }
    
    for(i in c("A", "B")) {
      acc <- acc(i)
    }
    
    acc
    #> $a
    #> [1] "A"
    #> 
    #> $b
    #> [1] "B"
    

    However, someone reviewing your code (including yourself at a later date) might think this dangling variable in your code is pointless and inadvertently delete it. For that reason, R contains the function force, which is just:

    force
    #> function (x) 
    #> x
    

    This seems like a bit of a pointless function, but it does two things. Firstly, it forces evaluation of the variable, and secondly, it makes it clear to anyone reading your code that this is precisely what you are trying to do.

    The standard way to write acc so that it does what you intend would therefore be something like:

    acc <- function (a) {
      force(a)
      return(function (b) {
        list(a = a, b = b)
      })
    }
    
    for(i in c("A", "B")) {
      acc <- acc(i)
    }
    
    acc
    #> $a
    #> [1] "A"
    #> 
    #> $b
    #> [1] "B"
    

    and, for the more complex example:

    acc <- function (x) {
             force(x) 
             function (y) {
               force(y)
               function (z) {
                 base::list(x = x, y = y, z = z)
               }
             }
           }
    
    for (i in c("A","B","C")) { acc = acc(i) }
    acc
    #> $x
    #> [1] "A"
    #> 
    #> $y
    #> [1] "B"
    #> 
    #> $z
    #> [1] "C"
    

    A detailed explanation with further examples can be found in the function factories chapter of "Advanced R". As described there:

    In general, this problem will arise whenever a binding changes in between calling the factory function and calling the manufactured function. This is likely to only happen rarely, but when it does, it will lead to a real head-scratcher of a bug.

    Prophetic words it seems.