Search code examples
revaluationiterable-unpackinginfix-operator

How does R evaluate these weird expressions?


I was trying to make Python 3-style assignment unpacking possible in R (e.g., a, *b, c = [1,2,3], "C"), and although I got so close (you can check out my code here), I ultimately ran into a few (weird) problems.

My code is meant to work like this:

a %,*% b %,% c <- c(1,2,3,4,5)

and will assign a = 1, b = c(2,3,4) and c = 5 (my code actually does do this, but with one small snag I will get to later).

In order for this to do anything, I have to define:

`%,%` <- function(lhs, rhs) {
   ...
}

and

`%,%<-` <- function(lhs, rhs, value) {
   ...
}

(as well as %,*% and %,*%<-, which are slight variants of the previous functions).

First issue: why R substitutes *tmp* for the lhs argument

As far as I can tell, R evaluates this code from left to right at first (i.e., going from a to c, until it reaches the last %,%, where upon, it goes back from right to left, assigning values along the way. But the first weird thing I noticed is that when I do match.call() or substitute(lhs) in something like x %infix% y <- z, it says that the input into the lhs argument in %infix% is *tmp*, instead of say, a or x.

This is bizarre to me, and I couldn't find any mention of it in the R manual or docs. I actually make use of this weird convention in my code (i.e., it doesn't show this behavior on the righthand side of the assignment, so I can use the presence of the *tmp* input to make %,% behave differently on this side of the assignment), but I don't know why it does this.

Second issue: why R checks for object existence before anything else

My second problem is what makes my code ultimately not work. I noticed that if you start with a variable name on the lefthand side of any assignment, R doesn't seem to even start evaluating the expression---it returns the error object '<variable name>' not found. I.e., if x is not defined, x %infix% y <- z won't evaluate, even if %infix% doesn't actually use or evaluate x.

Why does R behave like this, and can I change it or get around it? If I could to run the code in %,% before R checks to see if x exists, I could probably hack it so that I wouldn't be a problem, and my Python unpacking code would be useful enough to actually share. But as it is now, the first variable needs to already exist, which is just too limiting in my opinion. I know that I could probably do something by changing the <- to a custom infix operator like %<-%, but then my code would be so similar to the zeallot package, that I wouldn't consider it worth it. (It's already very close in what it does, but I like my style better.)

Edit:

Following Ben Bolker's excellent advice, I was able to find a way around the problem... by overwriting <-.

`<-` <- function(x, value) {
  base::`<-`(`=`, base::`=`)
  find_and_assign(match.call(), parent.frame())
  do.call(base::`<-`, list(x = substitute(x), value = substitute(value)),
          quote = FALSE, envir = parent.frame())
}
find_and_assign <- function(expr, envir) {
  base::`<-`(`<-`, base::`<-`)
  base::`<-`(`=`, base::`=`)
  while (is.call(expr))  expr <- expr[[2]]
  if (!rlang::is_symbol(expr)) return()
  var <- rlang::as_string(expr) # A little safer than `as.character()`
  if (!exists(var, envir = envir)) {
    assign(var, NULL, envir = envir)
  }
}

I'm pretty sure that this would be a mortal sin though, right? I can't exactly see how it would mess anything up, but the tingling of my programmer senses tells me this would not be appropriate to share in something like a package... How bad would this be?


Solution

  • For your first question, about *tmp* (and maybe related to your second question):

    From Section 3.4.4 of the R Language definition:

    Assignment to subsets of a structure is a special case of a general mechanism for complex assignment:

    x[3:5] <- 13:15
    

    The result of this command is as if the following had been executed

    `*tmp*` <- x
    x <- "[<-"(`*tmp*`, 3:5, value=13:15)
    rm(`*tmp*`)
    

    Note that the index is first converted to a numeric index and then the elements are replaced sequentially along the numeric index, as if a for loop had been used. Any existing variable called *tmp* will be overwritten and deleted, and this variable name should not be used in code.

    The same mechanism can be applied to functions other than [. The replacement function has the same name with <- pasted on. Its last argument, which must be called value, is the new value to be assigned.

    I can imagine that your second problem has to do with the first step of the "as if" code: if R is internally trying to evaluate *tmp* <- x, it may be impossible to prevent from trying to evaluate x at this point ...

    If you want to dig farther, I think the internal evaluation code used to deal with "complex assignment" (as it seems to be called in the internal comments) is around here ...