Search code examples
rshinyemmeans

Why do I need to hardcode the formula in emmeans() instead of dynamically extracting the formula from a dynamic linear model?


Edit: added a non-Shiny MRE, though that has a different error than the Shiny one...

Consider the following MRE (note that the app will not display correctly, this code is just to get it running for the MRE so you can test at browser().) This MRE is simulating a part of a larger app where the user has specified a model, seen significance in the ANOVA, and is looking to run the appropriate post-hocs. In this case post-hocs in the presence of heteroscedascitity, so we have to build a model with different variances per effect combination.

library(shiny)
require(emmeans)
require(nlme)
require(DT)

data<-structure(list(percent = c(2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 
                                 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
                                 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
                                 2, 2, 1, 1, 1, 1, 1, 1, 1, 1),
                     cycle = c(4, 4, 4, 4, 4, 4, 4, 
                               4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
                               3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 
                               1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
                     density = c(21, 
                                 20, 21, 21, 22, 20, 20, 19, 18, 19, 19, 19, 19, 18, 18, 19, 20, 
                                 22, 22, 21, 22, 22, 20, 21, 20, 20, 19, 19, 19, 19, 19, 19, 23, 
                                 25, 24, 24, 26, 23, 24, 24, 23, 22, 22, 23, 21, 24, 22, 22, 16, 
                                 17, 16, 15, 17, 16, 16, 16, 26, 27, 26, 25, 25, 26, 26, 26), 
                     run = c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 4L, 4L, 4L, 4L, 4L, 
                             4L, 4L, 4L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 
                             3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 2L, 2L, 2L, 
                             2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 
                             1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
                                                                                           -64L))


# Define UI for application
ui <- fluidPage(

    # Application title
    titlePanel("Blah"),

    # Sidebar with a slider input for number of bins 
    sidebarLayout(
        sidebarPanel(
        ),

        # Show a plot of the generated distribution
        mainPanel(
           #plotOutput("distPlot"),
           dataTableOutput("otherthing"),
           textOutput("post_hoc")
        )
    )
)

# Define server logic required to draw a histogram
server <- function(input, output) {
  
  aov_model <- reactiveVal(NULL)
  
  output$otherthing<-renderDataTable({
    formula<-"density~percent*cycle"#selected by user
    aov_mod <-lm(formula = formula(formula),data = data)
    data$cycle<-factor(data$cycle)
    data$percent<-factor(data$percent)
    aov_model(aov_mod)
    #browser()
    print(aov_mod[["coefficients"]])
  })
  
  output$post_hoc<-renderText({
    data$group<-interaction(data[c(1,2)])
    gls_formula<-formula(aov_model())
    #environment(gls_formula) <- .GlobalEnv
    model<-gls(gls_formula, data = data, weights = varIdent(form = ~1 | group))
    model2<-gls(density~percent*cycle, data = data, weights = varIdent(form = ~1 | group))
    browser()
    results<-emmeans(model,
                     pairwise~percent*cycle,
                     adjust = "tukey",
                     type = "response",
                     data=data)
    
    results2<-emmeans(model2,
                     pairwise~percent*cycle,
                     adjust = "tukey",
                     type = "response",
                     data=data)
    print(results2)
  })

Notice at the browser break that the results for the contrasts work fine if you hand type "pairwise ~ percent * cycle" for the gls formula later used in emmeans() (executing results2). But will not if you use the extracted formula from the reactive linear model (executing results) in emmeans(). In that case it will throw : Error in eval(call$model) : object 'gls_formula' not found and another error because of that.

In the real app, all of this is dynamically chosen by the user and potentially modified by the app, so I really need the gls model using the dynamically extracted formula to work instead of just rebuilding the selected effects.

I don't understand why an extracted formula that as far as I can tell is identical to the hard coded one doesn't work. I even tried setting the environment on the extracted one, but that doesn't seem to change anything.

Below is a non-Shiny example. It throws a different error, but perhaps it is related to the same issue? This error is: Error in call$model[[2]] : object of type 'symbol' is not subsettable

require(emmeans)
require(nlme)

data<-structure(list(percent = c(2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 
                                 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
                                 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
                                 2, 2, 1, 1, 1, 1, 1, 1, 1, 1),
                     cycle = c(4, 4, 4, 4, 4, 4, 4, 
                               4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
                               3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 
                               1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
                     density = c(21, 
                                 20, 21, 21, 22, 20, 20, 19, 18, 19, 19, 19, 19, 18, 18, 19, 20, 
                                 22, 22, 21, 22, 22, 20, 21, 20, 20, 19, 19, 19, 19, 19, 19, 23, 
                                 25, 24, 24, 26, 23, 24, 24, 23, 22, 22, 23, 21, 24, 22, 22, 16, 
                                 17, 16, 15, 17, 16, 16, 16, 26, 27, 26, 25, 25, 26, 26, 26), 
                     run = c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 4L, 4L, 4L, 4L, 4L, 
                             4L, 4L, 4L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 
                             3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 2L, 2L, 2L, 
                             2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 
                             1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
                                                                                           -64L))



formula<-"density~percent*cycle" #selected by user
effect<-"percent*cycle" #selected by user
data$cycle<-factor(data$cycle)
data$percent<-factor(data$percent)

aov_mod <-lm(formula = formula(formula),data = data)

#later in the app
data$group<-interaction(data[c(1,2)])
gls_formula<-formula(aov_mod)
#environment(gls_formula) <- .GlobalEnv
model<-gls(gls_formula, data = data, weights = varIdent(form = ~1 | group)) #using extracted formula
model2<-gls(density~percent*cycle, data = data, weights = varIdent(form = ~1 | group)) #using hard coded formula
emmeans(model,
        formula(paste("pairwise ~",gsub(pattern = ":",replacement = "*",x = effect))),
        adjust = "tukey",
        type = "response",
        data=data)
emmeans(model2,
        formula(paste("pairwise ~",gsub(pattern = ":",replacement = "*",x = effect))),
        adjust = "tukey",
        type = "response",
        data=data)


Solution

  • At some point emmeans is hitting code where it tries to extract the formula from the $call element of the model to find the variables involved in the model. It finds the symbol gls_model and can't extract its second element (which would be the right-hand side, for a formula object. (In the Shiny case, it can't even find that object because it's in another environment somewhere.)

    The standard workaround for these kinds of problems (as in this example) is to embed your model call in do.call(), which induces an extra evaluation step:

    model <- do.call(gls,
       list(gls_formula, data = data, 
           weights = varIdent(form = ~1 | group)))
    

    As far as I can tell this should solve both your Shiny and your non-Shiny problem, but I've only tested the non-Shiny case ...