Search code examples
rfunctionmetaprogrammingexpression

Finding the names of all functions in an R expression


I'm trying to find the names of all the functions used in an arbitrary legal R expression, but I can't find a function that will flag the below example as a function instead of a name.

test <- expression(
    this_is_a_function <- function(var1, var2){

    this_is_a_function(var1-1, var2)
})

all.vars(test, functions = FALSE)

[1] "this_is_a_function" "var1"              "var2" 

all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).

Is there any function - in the core libraries or elsewhere - that will flag 'this_is_a_function' as a function, not a name? It needs to work on arbitrary expressions, that are syntactically legal but might not evaluate correctly (e.g '+'(1, 'duck'))

I've found similar questions, but they don't seem to contain the solution.

If clarification is needed, leave a comment below. I'm using the parser package to parse the expressions.

Edit: @Hadley

I have expressions with contain entire scripts, which usually consist of a main function containing nested function definitions, with a call to the main function at the end of the script.

Functions are all defined inside the expressions, and I don't mind if I have to include '<-' and '{', since I can easy filter them out myself.

The motivation is to take all my R scripts and gather basic statistics about how my use of functions has changed over time.

Edit: Current Solution

A Regex-based approach grabs the function definitions, combined with the method in James' comment to grab function calls. Usually works, since I never use right-hand assignment.

function_usage <- function(code_string){
    # takes a script, extracts function definitions

    require(stringr)

    code_string <- str_replace(code_string, 'expression\\(', '')

    equal_assign <- '.+[ \n]+<-[ \n]+function'
    arrow_assign <- '.+[ \n]+=[ \n]+function'

    function_names <- sapply(
        strsplit(
            str_match(code_string, equal_assign), split = '[ \n]+<-'),    
        function(x) x[1])

    function_names <- c(function_names, sapply(
        strsplit(
            str_match(code_string, arrow_assign), split = '[ \n]+='),    
            function(x) x[1]))

        return(table(function_names))    
    }

Solution

  • Short answer: is.function checks whether a variable actually holds a function. This does not work on (unevaluated) calls because they are calls. You also need to take care of masking:

    mean <- mean (x)
    

    Longer answer:

    IMHO there is a big difference between the two occurences of this_is_a_function.

    In the first case you'll assign a function to the variable with name this_is_a_function once you evaluate the expression. The difference is the same difference as between 2+2 and 4.
    However, just finding <- function () does not guarantee that the result is a function:

    f <- function (x) {x + 1} (2)
    

    The second occurrence is syntactically a function call. You can determine from the expression that a variable called this_is_a_function which holds a function needs to exist in order for the call to evaluate properly. BUT: you don't know whether it exists from that statement alone. however, you can check whether such a variable exists, and whether it is a function.

    The fact that functions are stored in variables like other types of data, too, means that in the first case you can know that the result of function () will be function and from that conclude that immediately after this expression is evaluated, the variable with name this_is_a_function will hold a function.

    However, R is full of names and functions: "->" is the name of the assignment function (a variable holding the assignment function) ...

    After evaluating the expression, you can verify this by is.function (this_is_a_function). However, this is by no means the only expression that returns a function: Think of

    f <- function () {g <- function (){}}
    > body (f)[[2]][[3]]
    function() {
    }
    > class (body (f)[[2]][[3]])
    [1] "call"
    > class (eval (body (f)[[2]][[3]]))
    [1] "function"
    

    all.vars(expr, functions = FALSE) seems to return functions declarations (f <- function(){}) in the expression, while filtering out function calls ('+'(1,2), ...).

    I'd say it is the other way round: in that expression f is the variable (name) which will be asssigned the function (once the call is evaluated). + (1, 2) evaluates to a numeric. Unless you keep it from doing so.

    e <- expression (1 + 2)
    > e <- expression (1 + 2)
    > e [[1]]
    1 + 2
    > e [[1]][[1]]
    `+`
    > class (e [[1]][[1]])
    [1] "name"
    > eval (e [[1]][[1]])
    function (e1, e2)  .Primitive("+")
    > class (eval (e [[1]][[1]]))
    [1] "function"