Search code examples
rlazy-evaluationlazyeval

lazyeval not finding `C_logit_link` when using binomial in the glm function


I'm really scratching my head here. I really don't understand what is going on. This is a MWE, but the actual code and purpose is more complex then this. So the code:

library(dplyr)
ds <- mutate(iris, Species = as.numeric(Species == 'setosa'))

ds %>%
    do_(
        .dots = lazyeval::interp(
            "broom::tidy(stats::glm(form, data = ., family = distr))",
            form = Species ~ Sepal.Length,
            distr = binomial()
        )
    )

Which returns: Error in family$linkfun(mustart) : object 'C_logit_link' not found ... but this code bit works fine:

ds %>%
    do_(
        .dots = lazyeval::interp(
            "broom::tidy(stats::glm(form, data = ., family = distr))",
            form = Sepal.Width ~ Sepal.Length,
            distr = gaussian()
        )
    )

The only difference between the two is the family distribution used (gaussian vs binomial) and the variable used.

So the question: why is it that lazyeval can't find C_logit_link?


Solution

  • When you call interp(x, *), it evaluates the arguments that are to be interpolated into x. In the case of binomial(), the result is a structure that represents the binomial distribution in a GLM.

    interp(~x, x=binomial())
    
    #~list(family = "binomial", link = "logit", linkfun = function (mu) 
    #.Call(C_logit_link, mu), linkinv = function (eta) 
    #.Call(C_logit_linkinv, eta), variance = function (mu) 
    #mu * (1 - mu), dev.resids = function (y, mu, wt) 
    #.Call(C_binomial_dev_resids, y, mu, wt), aic = function (y, n, 
    #    mu, wt, dev) 
    #{
    #    m <- if (any(n > 1)) 
    #    . . .
    

    Buried inside that structure is a function that calls out to compiled C code, via the object C_logit_link. This is an unexported object in the stats package. Normally everything works fine, because the environment of that function is the stats namespace and so it's able to find C_logit_link.

    The problem here is that the object you're interpolating is a string, which means that everything interpolated into it is also coerced into a string. That loses the environment information necessary to find C_logit_link.

    The solution is to interp a formula instead:

    library(dplyr)
    ds <- mutate(iris, Species = as.numeric(Species == 'setosa'))
    
    ds %>%
        do_(
            .dots = lazyeval::interp(
                ~broom::tidy(stats::glm(form, data = ., family = distr)),  # formula
                form = Species ~ Sepal.Length,
                distr = binomial()
            )
        )
    
    #          term  estimate std.error statistic      p.value
    #1  (Intercept) 27.828521 4.8275611  5.764509 8.189574e-09
    #2 Sepal.Length -5.175698 0.8933984 -5.793270 6.902910e-09