Search code examples
rstan

How to correctly pass formulas associated with a variable name with random effects into fitted regression models in `R`?


I currently have a problem in that I have to pre-specify my formulas before sending them into a regression function. For example, using the stan_gamm4 function in R, we have the following example:

dat <- mgcv::gamSim(1, n = 400, scale = 2) ## simulate 4 term additive truth
## Now add 20 level random effect `fac'...
dat$fac <- fac <- as.factor(sample(1:20, 400, replace = TRUE))
dat$y <- dat$y + model.matrix(~ fac - 1) %*% rnorm(20) * .5

br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = ~ (1 | fac), 
                 chains = 1, iter = 200) # for example speed

Now, because the formula and random formula were specified explicitly, then if we call:

br$call$random
> ~(1 | fac)

We are able to retrieve the form of the random effects.

NOW, let us then leave everything the same, BUT use an expression for the random part:

formula.rand <- as.formula( '~(1|fac)' )

Then, if we did the same thing before, but with formula.rand taking the place, we have:

br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = formula.rand, 
                     chains = 1, iter = 200) # for example speed

BUT NOW: we have that:

br$call$random
> formula.rand

Instead of the original. A lot of bayesian packages rely on accessing br$call$random, so is there a way to use a variable for formula, have it pass in, AND retain the original relation when calling br$call$random? Thanks.


Solution

  • While I haven't used Stan, this is a problem inherent in the way that R handles storing calls. You can see it happening with lm, for example:

    model <- function(formula)
    {
        lm(formula, data=mtcars)
    }
    m <- model(mpg ~ disp)
    m$call$formula
    # formula
    

    The simplest solution is to construct the call using substitute to insert the actual values you want to keep, not the symbol name. In the case of lm, this would be something like

    model2 <- function(formula)
    {
        call <- substitute(lm(formula=.f, data=mtcars), list(.f=formula))
        eval(call)
    }
    m2 <- model2(mpg ~ disp)
    m2$call$formula
    # mpg ~ disp
    

    For Stan, you can do

    stan_call <- substitute(br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data=dat, random=.rf,
                     chains=1, iter=200),
                 list(.rf=formula.rand))
    br <- eval(stan_call)