Search code examples
rvariadic-functionslm

Use of variable arguments (dot-dot-dot) in stats::lm in R


Suppose we have a function that makes a call to stats::lm and takes a formula and a data frame as arguments. Further arguments that we want to pass to stats::lm can be provided using variable arguments:

outer_function <- function(formula, data, ...) {
  z <- stats::lm(formula = formula, data = data, ...)
  return(z)
}

Now suppose we want to use this function and provide an additional argument (weights) that will be passed to stats::lm.

data <- data.frame(replicate(5, rnorm(100)))
weights <- replicate(100, 1)
formula <- X1 ~ X2 + X3

outer_function(formula = formula, data = data, weights = weights)

This produces the following error in stats::lm:

Error in eval(extras, data, env) : 
  ..1 used in an incorrect context, no ... to look in

Debugging the call to stats::lm I see that argument weights is correctly passed to stats::lm, but match.call(), which is later used for evaluation in the function, is

stats::lm(formula = formula, data = data, weights = ..1)

such that weights is assigned the first element of the ...-list, which is empty.

Can anybody elaborate on why this approach fails? In particular, if weights was a scalar (say 5) the problem would not arise and the match.call() would be

stats::lm(formula = formula, data = data, weights = 5)

For now, I am using the following solution for my function:

outer_function <- function(formula, data, ...) {
  args <- list(formula = formula, data = data, ...)
  z <- do.call(stats::lm, args)
  return(z) 
}

which works but I am still wondering whether there is no way around do.call in case the arguments in ... are vectors or lists.


Solution

  • I can't think of a work-around as safe and as succinct as do.call. I can explain what's going on, having debugged into the lm call.

    In the body of lm, you'll find the statement

    mf <- eval(mf, parent.frame())
    

    On the right hand side of the assignment, mf is the call

    stats::model.frame(formula = formula, data = data, weights = ..1, 
        drop.unused.levels = TRUE)
    

    and parent.frame() is the frame of the outer_function call (in other words, the evaluation environment of outer_function). eval is evaluating mf in parent.frame(). Due to S3 dispatch, what is ultimately evaluated in parent.frame() is the call

    stats::model.frame.default(formula = formula, data = data, weights = ..1, 
        drop.unused.levels = TRUE)
    

    In the body of model.frame.default, you'll find the statement

    extras <- eval(extras, data, env)
    

    On the right hand side of this assignment, extras is the call

    list(weights = ..1)
    

    specifying the arguments from mf matched to the formal argument ... of model.frame.default (just weights, in this case, because model.frame.default has formal arguments named formula, data, and drop.unused.levels); data is the data frame containing your simulated data; and env is your global environment. (env is defined earlier in the body of model.frame.default as environment(formula), which is indeed your global environment, because that is where you defined formula.)

    eval is evaluating extras in data with env as an enclosure. An error is thrown here, because the data frame data and your global environment env are not valid contexts for ..n. The symbol ..1 is valid only in the frame of a function with ... as a formal argument.

    You might have deduced the issue from ?lm, which notes:

    All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.

    There is no problem when weights is given the value of a constant (i.e., not the name of a variable bound in an environment and not a function call) in the outer_function call, because in that situation match.call does not substitute the symbol ..n. Hence

    outer_function(formula = formula, data = data, weights = 5)
    

    works (well, a different error is thrown), but

    weights <- 5
    outer_function(formula = formula, data = data, weights = weights)
    

    and

    outer_function(formula = formula, data = data, weights = rep(1, 100))
    

    don't.