Search code examples
rlme4mixed-modelssjplotrstanarm

Why `parameters::model_parameters` throws an error for negative binomial `rstanarm` model


I am trying to create a table for stan_glmer.nb (rstanarm) output, but model_parameters from the package parameters throws an odd error, that I am unsure how to solve. Perhaps this is a bug.

Shortened sessionInfo() output for version info:

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

parameters_0.8.2
rstanarm_2.21.1

A reproducible example:

library(rstanarm)
library(parameters)

x<-rnorm(500)
dat<-data.frame(x=x,z=rep(c("A","B","C","D","E"),100), y=.2+x*.7)

mod1<-stan_glmer(y~x+(x|z),data=dat)

model_parameters(mod1, effects="all")

I will spare you the output here, because it isn't important, but the function works. Now the negative binomial model:

dat.nb<-data.frame(x=rnorm(500),z=rep(c("A","B","C","D","E"),100),
                y=rnbinom(500,size=1,prob = .5))

mod2<-stan_glmer.nb(y~x+(x|z),data=dat.nb)

model_parameters(mod2, effects="all")

Now an error message:

Error in `$<-.data.frame`(`*tmp*`, "parameter", value = c("(Intercept)",  : 
  replacement has 3 rows, data has 1

Although with parameters version 0.10.1, @BenBolker gets a blank output, instead of the error (see comments). Either way, it seems like this function isn't working for rstanarm discrete distributions (see comments). As far as I can see in the help documentation, there is nothing indicating the need to specify a negative binomial model. Furthermore, the function works fine for lme4 models:

library(lme4)
mod1<-lmer(y~x+(x|z),data=dat)

model_parameters(mod1, effects="all")

mod2<-glmer.nb(y~x+(x|z),data=dat.nb)

model_parameters(mod2, effects="all")

There are some model convergence issues, etc. with this simulated data, but model_parameters works for the glmer.nb model, but not the stan_glmer.nb model. Any idea what is going on here?


I have run into the same issue with a completely different dataset, and still can not figure out why "replacement" has 2 rows more than "data" in parameters::model_parameters (see error above). One additional row might be the reciprocal_dispersion parameter that the function isn't expecting, but not sure why the function would have a bug for the negative binomial glmms, which are quite common.

As a note, the tidy_stan function from sjPlot package still works for these models, but gives the warning:

Warning message:
'tidy_stan' is deprecated.
Use 'parameters::model_parameters()' instead.
See help("Deprecated") 

Yet, parameters::model_parameters(), as noted above, does not yet work.


Solution

  • Though it was quite a challenge, I finally figured out the bug (and has an easy fix, go to the end of the post if too long to read). I pooled the thread by finding the instruction that cause the error. Starting with:

    model_parameters(model = mod2, effects="all")
    

    Which fails at:

    parameters:::model_parameters.stanreg(model = mod2, effects="all", prior = T)
    

    Which fails at:

    params <- parameters:::.extract_parameters_bayesian(mod2, centrality = "median", 
    dispersion = F, ci = 0.89, ci_method = "hdi", 
    test = "pd", rope_range = "default", rope_ci = 1, 
    bf_prior = NULL, diagnostic = "ESS", priors = T, 
    effects = "fixed", standardize = NULL)
    

    which fails at:

    parameters <- bayestestR:::describe_posterior.stanreg(mod2, centrality = "median", 
    dispersion = F, ci = 0.89, ci_method = "hdi", 
    test = "pd", rope_range = "default", rope_ci = 1, 
    bf_prior = NULL, diagnostic = "ESS", priors = T)
    

    which fails at:

    priors_data <- bayestestR:::describe_prior.stanreg(mod2)
    

    which fails at:

    priors <- insight:::get_priors.stanreg(mod2)
    

    To find at what stage it failed from here on, I copied the source code of this function (now defined as GET_priors) and placed some strategic prints:

    GET_priors <- function(x) # Modified with tags to see where it fails
    {
      ps <- rstanarm::prior_summary(mod2)
      l <- insight:::.compact_list(lapply(ps[c("prior_intercept", "prior")], 
      function(.x) {
          if (!is.null(.x)) {
              if (is.na(.x$dist)) {
                  .x$dist <- "uniform"
                  .x$location <- 0
                  .x$scale <- 0
                  .x$adjusted_scale <- 0
              }
              .x <- do.call(cbind, .x)
              as.data.frame(.x)
          }
      }))
      
      print("STEP1")
      
      cn <- unique(unlist(lapply(l, colnames)))
      l <- lapply(l, function(.x) {
          missing <- setdiff(cn, colnames(.x))
          if (length(missing)) {
              .x[missing] <- NA
          }
          .x
      })
      
      print("STEP2")
      
      if(length(l) > 1)
      {
        prior_info <- do.call(rbind, l)
      }
      else
      {
        cn <- colnames(l[[1]])
        prior_info <- as.data.frame(l)
        colnames(prior_info) <- cn
      }
      
      print("STEP3")
      
      flat <- which(prior_info$dist == "uniform")
      if (length(flat) > 0) {
        prior_info$location[flat] <- NA
        prior_info$scale[flat] <- NA
        prior_info$adjusted_scale[flat] <- NA
      }
      
      print("STEP4")
      print(prior_info)
      print(insight:::find_parameters(x)$conditional)
      
      prior_info$parameter <- insight:::find_parameters(x)$conditional
      
      print("STEP4.1")
      
      prior_info <- prior_info[, intersect(c("parameter", 
                                             "dist", "location", "scale", "adjusted_scale"), 
                                           colnames(prior_info))]
      
      print("STEP4.2")
      
      colnames(prior_info) <- gsub("dist", "distribution", 
                                   colnames(prior_info))
      
      print("STEP4.3")
      
      colnames(prior_info) <- gsub("df", "DoF", colnames(prior_info))
      
      print("STEP4.4")
      
      priors <- as.data.frame(lapply(prior_info, function(x) {
        if (insight:::.is_numeric_character(x)) {
          as.numeric(as.character(x))
        }
        else {
          as.character(x)
        }
      }), stringsAsFactors = FALSE)
      
      print("STEP5")
      
      string <- strsplit(names(priors), "_", fixed = TRUE)
      string <- lapply(string, insight:::.capitalize)
      names(priors) <- unlist(lapply(string, paste0, collapse = "_"))
      priors
    }
    
    GET_priors(mod2)
    # [1] "STEP1"
    # [1] "STEP2"
    # [1] "STEP3"
    # [1] "STEP4"
    #                   dist location scale   adjusted_scale
    # prior_intercept normal        0   2.5             <NA>
    # prior           normal        0   2.5 2.63656782500616
    # [1] "(Intercept)"           "x"                     "reciprocal_dispersion"
    #  Error in `$<-.data.frame`(`*tmp*`, "parameter", value = c("(Intercept)",  : 
    #  replacement has 3 rows, data has 2 
    

    For some odd reason, its trying to add column of 3 rows to a data.frame with 2 rows (hence the error). However, the module that fails is related to the priors. We can obtain results by simply setting prior equal to F, avoiding all that branch in the code as in:

    model_parameters(model = mod2, effects="all", prior = F)
    # Fixed effects 
    # 
    # Parameter             |   Median |            CI |     pd | % in ROPE |  Rhat |  ESS
    # ------------------------------------------------------------------------------------
    #   (Intercept)           | 8.05e-03 | [-0.11, 0.13] | 54.00% |    81.15% | 1.002 | 1738
    # x                     |    -0.12 | [-0.25, 0.00] | 94.67% |    37.18% | 1.000 | 2784
    # reciprocal_dispersion |     0.97 | [ 0.75, 1.22] |   100% |        0% | 1.000 | 4463
    # 
    # # Random effects SD/Cor: z
    # 
    # Parameter       |    Median |            CI |     pd | % in ROPE |  Rhat |  ESS
    # -------------------------------------------------------------------------------
    #   (Intercept)     |  3.43e-03 | [ 0.00, 0.03] |   100% |    98.30% | 1.002 | 2077
    # x ~ (Intercept) | -9.39e-09 | [-0.01, 0.01] | 50.05% |    99.75% | 1.001 | 2099
    # x               |  2.93e-03 | [ 0.00, 0.02] |   100% |    99.08% | 1.001 | 2664
    # 
    # Using highest density intervals as credible intervals.
    

    Indeed, it is a bug and should be reported (only that the bug is in a dependency of a dependency; e.g. R-package "insight").