Search code examples
rfunctionscopeglmevaluation

scoping/non-standard evaluation issue in `glm()`'s formula in a function in R


I have a function that computes a table and a model (and more...):

fun <- function(x, y, formula = y ~ x, data = NULL) {
  out <- list()
  out$tab <- table(x, y)
  out$mod <- glm(formula = formula,
                 family = binomial,
                 data = data)
  out

}

In the formula, I need to use x and y as provided in the function call (e.g. x = DF1$x and y = DF1$y) and variables from another data frame (e.g. a and b from DF2). It fails with my naive function:

fun(x = DF1$x,
    y = DF1$y,
    formula = y ~ x + a + b,
    data = DF2)
# Error in eval(predvars, data, env) : object 'y' not found

How can I make glm() search x and y from the function environment? I guess this issue is related to non-standard evaluation and/or scoping, but I have no idea how to fix it.

Data for the example:

smp <- function(x = c(TRUE, FALSE),
                size = 1e2) {
  sample(x = x,
         size = size,
         replace = TRUE)
  }

DF1 <- data.frame(x = smp(),
                  y = smp())

DF2 <- data.frame(a = smp(x = LETTERS),
                  b = smp(x = LETTERS))

Solution

  • Why not just add x and y into data in the function?

    fun <- function(x, y, formula = y ~ x, data = NULL) {
      if(length(x) != length(y) | 
         length(x) != nrow(data) | 
         length(y) != nrow(data))stop("x, y and data need to be the same length.\n")
      data$x <- x
      data$y <- y
      out <- list()
      out$tab <- table(x, y)
      out$mod <- glm(formula = formula,
                     family = binomial,
                     data = data)
      out
    }
    
    fun(x = DF1$x,
        y = DF1$y,
        formula = y ~ x + a + b,
        data = DF2)
    # $tab
    # y
    # x       FALSE TRUE
    # FALSE    27   29
    # TRUE     21   23
    # 
    # $mod
    # Call:  glm(formula = formula, family = binomial, data = data)
    # 
    # Coefficients:
    #   (Intercept)        xTRUE           aB           aC           aD           aE           aF           aG           aH           aI           aJ  
    # 3.2761      -1.8197       0.3409     -93.9103      -2.0697      20.6813     -41.5963      -1.1078      18.5921      -1.0857     -36.5442  
    # aK           aL           aM           aN           aO           aP           aQ           aR           aS           aT           aU  
    # -0.5730     -92.5513      -3.0672      22.8989     -53.6200      -0.9450       0.4626      -3.0672       0.3570     -22.8857       1.8867  
    # aV           aW           aX           aY           aZ           bB           bC           bD           bE           bF           bG  
    # 2.5307      19.5447     -90.5693    -134.0656      -2.5943      -1.2333      20.7726     110.6790      17.1022      -0.5279      -1.2537  
    # bH           bI           bJ           bK           bL           bM           bN           bO           bP           bQ           bR  
    # -21.7750     114.0199      20.3766     -42.5031      41.1757     -24.3553      -2.0310     -25.9223      -2.9145      51.2537      70.2707  
    # bS           bT           bU           bV           bW           bX           bY           bZ  
    # -4.7728      -3.7300      -2.0333      -0.3906      -0.5717      -4.0728       0.8155      -4.4021  
    # 
    # Degrees of Freedom: 99 Total (i.e. Null);  48 Residual
    # Null Deviance:        138.5 
    # Residual Deviance: 57.73  AIC: 161.7
    # 
    # Warning message:
    #   glm.fit: fitted probabilities numerically 0 or 1 occurred 
    #