Search code examples
rexpressionmetaprogrammingrlang

R - How to extract object names from expression


Given an rlang expression:

expr1 <- rlang::expr({
  d <- a + b
})

How to retrieve the names of the objects refered to within the expression ?

> extractObjects(expr1)
[1] "d" "a" "b"

Better yet, how to retrieve the object names and categorise them by "required"(input) and "created"(output) ?

> extractObjects(expr1)
$created
[1] "d"

$required
[1] "a" "b"

Solution

  • The base function all.vars does this:

    〉all.vars(expr1)
    [1] "d" "a" "b"
    

    Alternatively, you can use all.names to get all names in the expression rather than just those that aren’t used as calls or operators:

    〉all.names(expr1)
    [1] "{"  "<-" "d"  "+"  "a"  "b"
    

    Don’t be misled: this result is correct! All of these appear in the expression, not just a, b and d.

    But it may not be what you want.

    In fact, I’m assuming what you want corresponds to the leaf tokens in the abstract syntax tree (AST) — in other words, everything except function calls (and operators, which are also function calls).

    The syntax tree for your expression looks as follows:1

       {
       |
       <-
       /\
      d  +
        / \
       a   b
    

    Getting this information means walking the AST:

    leaf_nodes = function (expr) {
        if(is.call(expr)) {
            unlist(lapply(as.list(expr)[-1L], leaf_nodes))
        } else {
            as.character(expr)
        }
    }
    
    〉leaf_nodes(expr1)
    [1] "d" "a" "b"
    

    Thanks to the AST representation we can also find inputs and outputs:

    is_assignment = function (expr) {
        is.call(expr) && as.character(expr[[1L]]) %in% c('=', '<-', '<<-', 'assign')
    }
    
    vars_in_assign = function (expr) {
        if (is.call(expr) && identical(expr[[1L]], quote(`{`))) {
            vars_in_assign(expr[[2L]])
        } else if (is_assignment(expr)) {
            list(created = all.vars(expr[[2L]]), required = all.vars(expr[[3L]]))
        } else {
            stop('Expression is not an assignment')
        }
    }
    
     〉vars_in_assign(expr1)
    $created
    [1] "d"
    
    $required
    [1] "a" "b"
    

    Note that this function does not handle complex assignments (i.e. stuff like d[x] <- a + b or f(d) <- a + b very well.


    1 lobstr::ast shows the syntax tree differently, namely as

    █─`{`
    └─█─`<-`
      ├─d
      └─█─`+`
        ├─a
        └─b

    … but the above representation is more conventional outside R, and I find it more intuitive.