rfunctionscopeglmevaluation

# scoping/non-standard evaluation issue in `glm()`'s formula in a function in R

I have a function that computes a table and a model (and more...):

``````fun <- function(x, y, formula = y ~ x, data = NULL) {
out <- list()
out\$tab <- table(x, y)
out\$mod <- glm(formula = formula,
family = binomial,
data = data)
out
``````

}

In the formula, I need to use `x` and `y` as provided in the function call (e.g. `x = DF1\$x` and `y = DF1\$y`) and variables from another data frame (e.g. `a` and `b` from `DF2`). It fails with my naive function:

``````fun(x = DF1\$x,
y = DF1\$y,
formula = y ~ x + a + b,
data = DF2)
``````

How can I make `glm()` search `x` and `y` from the function environment? I guess this issue is related to non-standard evaluation and/or scoping, but I have no idea how to fix it.

Data for the example:

``````smp <- function(x = c(TRUE, FALSE),
size = 1e2) {
sample(x = x,
size = size,
replace = TRUE)
}

DF1 <- data.frame(x = smp(),
y = smp())

DF2 <- data.frame(a = smp(x = LETTERS),
b = smp(x = LETTERS))
``````

Solution

• Why not just add `x` and `y` into `data` in the function?

``````fun <- function(x, y, formula = y ~ x, data = NULL) {
if(length(x) != length(y) |
length(x) != nrow(data) |
length(y) != nrow(data))stop("x, y and data need to be the same length.\n")
data\$x <- x
data\$y <- y
out <- list()
out\$tab <- table(x, y)
out\$mod <- glm(formula = formula,
family = binomial,
data = data)
out
}

fun(x = DF1\$x,
y = DF1\$y,
formula = y ~ x + a + b,
data = DF2)
# \$tab
# y
# x       FALSE TRUE
# FALSE    27   29
# TRUE     21   23
#
# \$mod
# Call:  glm(formula = formula, family = binomial, data = data)
#
# Coefficients:
#   (Intercept)        xTRUE           aB           aC           aD           aE           aF           aG           aH           aI           aJ
# 3.2761      -1.8197       0.3409     -93.9103      -2.0697      20.6813     -41.5963      -1.1078      18.5921      -1.0857     -36.5442
# aK           aL           aM           aN           aO           aP           aQ           aR           aS           aT           aU
# -0.5730     -92.5513      -3.0672      22.8989     -53.6200      -0.9450       0.4626      -3.0672       0.3570     -22.8857       1.8867
# aV           aW           aX           aY           aZ           bB           bC           bD           bE           bF           bG
# 2.5307      19.5447     -90.5693    -134.0656      -2.5943      -1.2333      20.7726     110.6790      17.1022      -0.5279      -1.2537
# bH           bI           bJ           bK           bL           bM           bN           bO           bP           bQ           bR
# -21.7750     114.0199      20.3766     -42.5031      41.1757     -24.3553      -2.0310     -25.9223      -2.9145      51.2537      70.2707
# bS           bT           bU           bV           bW           bX           bY           bZ
# -4.7728      -3.7300      -2.0333      -0.3906      -0.5717      -4.0728       0.8155      -4.4021
#
# Degrees of Freedom: 99 Total (i.e. Null);  48 Residual
# Null Deviance:        138.5
# Residual Deviance: 57.73  AIC: 161.7
#
# Warning message:
#   glm.fit: fitted probabilities numerically 0 or 1 occurred
#

``````