Search code examples
rcall

Why is match.call useful?


In the body of some R functions, for example lm I see calls to the match.call function. As its help page says, when used inside a function match.call returns a call where argument names are specified; and this is supposed to be useful for passing a large number of arguments to another functions.

For example, in the lm function we see a call to the function model.frame...

function (formula, data, subset, weights, na.action, method = "qr", 
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
contrasts = NULL, offset, ...) 
{
  cl <- match.call()
  mf <- match.call(expand.dots = FALSE)
  m <- match(c("formula", "data", "subset", "weights", "na.action", 
      "offset"), names(mf), 0L)
  mf <- mf[c(1L, m)]

  mf$drop.unused.levels <- TRUE
  mf[[1L]] <- quote(stats::model.frame)
  mf <- eval(mf, parent.frame())
  ...

...Why is this more useful than making a straight call to model.frame specifying the argument names as I do next?

function (formula, data, subset, weights, na.action, method = "qr", 
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
contrasts = NULL, offset, ...) 
{
  mf <- model.frame(formula = formula, data = data,
                    subset = subset, weights = weights, subset = subset)
  ...

(Note that match.call has another use that I do not discuss, store the call in the resulting object.)


Solution

  • One reason that is relevant here is that match.call captures the language of the call without evaluating it, and in this case it allows lm to treat some of the "missing" variables as "optional". Consider:

    lm(x ~ y, data.frame(x=1:10, y=runif(10)))
    

    Vs:

    lm2 <- function (
      formula, data, subset, weights, na.action, method = "qr", 
      model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
      contrasts = NULL, offset, ...
    ) {
      mf <- model.frame(
        formula = formula, data = data, subset = subset, weights = weights
      ) 
    }
    lm2(x ~ y, data.frame(x=1:10, y=runif(10)))
    ## Error in model.frame.default(formula = formula, data = data, subset = subset,  :
    ##   invalid type (closure) for variable '(weights)'
    

    In lm2, since weights is "missing" but you still use it in weights=weights, R tries to use the stats::weights function which is clearly not what was intended. You could get around this by testing for missingness before you call model.frame, but at that point the match.call starts looking pretty good. Look at what happens if we debug the call:

    debug(lm2)
    lm2(x ~ y, data.frame(x=1:10, y=runif(10)))
    ## debugging in: lm2(x ~ y, data.frame(x = 1:10, y = runif(10)))
    ## debug at #5: {
    ##     mf <- model.frame(formula = formula, data = data, subset = subset,
    ##         weights = weights)
    ## }
    Browse[2]> match.call()
    ## lm2(formula = x ~ y, data = data.frame(x = 1:10, y = runif(10)))
    

    match.call doesn't involve the missing arguments at all.

    You could argue that the optional arguments should have been made explicitly optional via default values, but that's not what happened here.