Search code examples
rscopemodeling

How should I call model.frame in R?


I am trying to write my own modeling function in R, one which takes a formula, some data, and maybe some extra context, like weights; after calling model.frame to extract the necessary numeric data, it will perform a fit. My first pass looked like:

my_modfunc <- function(formula,data,weights=NULL) {
    mf <- model.frame(formula,data=data,weights=weights)
    wt <- model.weights(mf)
    # do some fitting here...
}

# make fake data to test it
set.seed(1234)
data <- data.frame(x1=rnorm(50),x2=rnorm(50),y=rnorm(50),w=runif(50))

# call it:
my_modfunc(y ~ x1 + x2,data=data,weights=w)

This fails, I get the error: Error in model.frame.default(formula, data = data, weights = weights) : invalid type (closure) for variable '(weights)'

Similarly, if I call

my_modfunc(y ~ x1 + x2,data=data,weights='w')

I get the same error. I suspect there is some problem with environment, quoting, and so on.

Cutting and pasting the source for lm, I could rewrite my function as

# based on lm
weird_modfunc <- function(formula,data,weights=NULL ) {
    cl <- match.call()  # what?
    mf <- match.call(expand.dots = FALSE)  # what??
    m <- match(c("formula", "data", "weights"), names(mf), 0L)
    mf <- mf[c(1L, m)]  # ??
    mf$drop.unused.levels <- TRUE # ??
    mf[[1L]] <- quote(stats::model.frame) ## ???
    mf <- eval(mf, parent.frame())
    wt <- as.vector(model.weights(mf))
    # do some fitting here...
}
# this runs without error:
weird_modfunc(y ~ x1 + x2,data=data,weights=w)
# this fails with the same error as above about variable lengths.
weird_modfunc(y ~ x1 + x2,data=data,weights='w')

The problem is that this contains multiple somewhat mystical incantations that I do not know how to interpret, modify or maintain.

What is the right way to call model.frame? Bonus points for making my function accept both weights=w and weights='w'


Solution

  • Welcome to the joys of non-standard evaluation. I suggest you base your function on the lm approach. It constructs a call to model.frame and evaluates it. That's necessary, because model.frame does non-standard evaluation, i.e., it accepts/expects a symbol for the weights parameter. Furthermore, it also ensures correct scoping regarding the formula's environment.

    weird_modfunc <- function(formula,data,weights=NULL ) {
      #cl not needed, lm only adds this call to the return object
      mf <- match.call(expand.dots = FALSE)
      message("Call with ellipses not expanded: ")
      #note that there are no ellipses in the function arguments for now, 
      #but you might want to change that later
      print(mf)
      #turn weights into symbol if character is passed
      if (is.character(mf$weights)) mf$weights <- as.symbol(mf$weights)
      m <- match(c("formula", "data", "weights"), names(mf), 0L)
      message("Position of formula, data and weights in the call:")
      print(m)
      mf <- mf[c(1L, m)]
      message("New call that only contains what is needed:")
      print(mf)
      mf$drop.unused.levels <- TRUE 
      message("Call with argument added:")
      print(mf)
      mf[[1L]] <- quote(stats::model.frame) 
      message("Change call to a call to model.frame:")
      print(mf)
      mf <- eval(mf, parent.frame()) #evaluate call
      wt <- as.vector(model.weights(mf))
      # do some fitting here...
      message("Return value:")
      wt
    }
    # this runs without error:
    weird_modfunc(y ~ x1 + x2,data=data,weights=w)
    #Call with ellipses not expanded: 
    #weird_modfunc(formula = y ~ x1 + x2, data = data, weights = w)
    #Position of formula, data and weights in the call
    #[1] 2 3 4
    #New call that only contains what is needed:
    #weird_modfunc(formula = y ~ x1 + x2, data = data, weights = w)
    #Call with argument added:
    #weird_modfunc(formula = y ~ x1 + x2, data = data, weights = w, 
    #    drop.unused.levels = TRUE)
    #Change call to a call to model.frame:
    #stats::model.frame(formula = y ~ x1 + x2, data = data, weights = w, 
    #    drop.unused.levels = TRUE)
    #Return value:
    # [1] 0.35299850 0.98095832 0.53888276 0.44403386 0.94936678 0.45248337 0.19062580 0.99160915 0.54845545 0.76881577 0.91342167 0.68211200 0.40725142
    #[14] 0.40759230 0.14608279 0.19666771 0.19220934 0.40841440 0.34822131 0.83454285 0.19840001 0.86180531 0.39718531 0.15325377 0.33928338 0.36718044
    #[27] 0.42737908 0.18633690 0.65801660 0.92041138 0.73389406 0.88231927 0.95334653 0.19490154 0.47261674 0.38605066 0.37416586 0.02785566 0.92935521
    #[40] 0.41052928 0.95584022 0.27215284 0.51724649 0.97830984 0.36969649 0.31043044 0.03420963 0.66756585 0.92091638 0.04498960
    
    #this runs without error too:
    weird_modfunc(y ~ x1 + x2,data=data,weights='w')
    

    Here is a simpler version but there might be problems (well, more than usual with non-standard evaluation):

    my_modfunc <- function(formula,data,weights=NULL) {
      weights <- substitute(weights)
      if (!is.symbol(weights)) weights <- as.symbol(weights)
      #substitute the symbol into the call:
      mf <- eval(substitute(model.frame(formula,data=data,weights=weights)))
      wt <- model.weights(mf)
      # do some fitting here...
      wt
    }
    
    my_modfunc(y ~ x1 + x2,data=data,weights=w)
    #works
    my_modfunc(y ~ x1 + x2,data=data,weights="w")
    #works