Search code examples
rmixed-modelsstanrstanrstanarm

Problematic `rstanarm::stan_lmer` - invalid internal subsetting


I'm running into the following problem, which appears to be related to subsetting done within rstanarm. See this related problem

Error in xj[i] : invalid subscript type 'list'

reproducible example:

library(stringi)
library(data.table)
library(Matrix)
library(rstanarm)
library(parallel)
options(mc.cores = 4)

set.seed(1423L)
# outcome variable is two level Gausian
n_grps <- 20
n_obs <- 100
theta_j <- as.list(rnorm(n_grps))
y_ij <- do.call("c", lapply(theta_j, function(j) {rnorm(n_obs, j)}))

# three indepdendent vars -- two random slopes and one fixed effect
x1 <- rep(rep(1:5, n_obs / 5), n_grps)
x2 <- rep(rep(5:1, n_obs / 5), n_grps)
x3 <- rnorm(n_grps * n_obs, mean = 10, sd= 10)
my_grps <- stringi::stri_rand_strings(n= n_grps, length= 10)

dt <- data.table(y_ij, grps= my_grps,
                 x1=x1, x2=x2, x3=x3)

# Error in xj[i] : invalid subscript type 'list'
mod1 <- rstanarm::stan_lmer(
  formula= y_ij ~ x3 + (x1 + x2 | grps),
  data= dt, gaussian(link = "identity"), 
  prior= normal(), prior_intercept= normal(),
  chains= 4, cores= 4, seed= 134L)

Operating environment

Toy example run on Mac OSX 10.13.3 Toy example and real data on Linux CentOS 7

Mac

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstanarm_2.17.3     Rcpp_0.12.15        Matrix_1.2-12       data.table_1.10.4-3 stringi_1.1.6      

loaded via a namespace (and not attached):
 [1] lattice_0.20-35    zoo_1.8-1          gtools_3.5.0       assertthat_0.2.0   digest_0.6.15      mime_0.5          
 [7] R6_2.2.2           plyr_1.8.4         stats4_3.4.3       ggplot2_2.2.1      colourpicker_1.0   pillar_1.1.0      
[13] rlang_0.1.6        lazyeval_0.2.1     minqa_1.2.4        miniUI_0.1.1       nloptr_1.0.4       DT_0.4            
[19] shinythemes_1.1.1  splines_3.4.3      shinyjs_1.0        lme4_1.1-15        stringr_1.2.0      loo_1.1.0         
[25] htmlwidgets_1.0    igraph_1.1.2       munsell_0.4.3      shiny_1.0.5        compiler_3.4.3     httpuv_1.3.5      
[31] rstan_2.17.3       pkgconfig_2.0.1    base64enc_0.1-3    rstantools_1.4.0   htmltools_0.3.6    tibble_1.4.2      
[37] gridExtra_2.3      codetools_0.2-15   matrixStats_0.53.1 threejs_0.3.1      dplyr_0.7.4        MASS_7.3-47       
[43] grid_3.4.3         nlme_3.1-131       xtable_1.8-2       gtable_0.2.0       magrittr_1.5       StanHeaders_2.17.2
[49] scales_0.5.0       reshape2_1.4.3     bindrcpp_0.2       dygraphs_1.1.1.4   xts_0.10-1         tools_3.4.3       
[55] glue_1.2.0         shinystan_2.4.0    markdown_0.8       crosstalk_1.0.0    survival_2.41-3    rsconnect_0.8.5   
[61] yaml_2.1.16        inline_0.3.14      colorspace_1.3-2   bayesplot_1.4.0    bindr_0.1  

Linux CentOS

R > sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/local/lib/libtatlas.so

Matrix products: default
BLAS/LAPACK: /usr/local/lib/libtatlas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] rstanarm_2.17.2     Rcpp_0.12.13        Matrix_1.2-10
[4] data.table_1.10.4-3 stringi_1.1.6

loaded via a namespace (and not attached):
 [1] lattice_0.20-35    zoo_1.8-0          gtools_3.5.0       assertthat_0.2.0
 [5] digest_0.6.12      mime_0.5           R6_2.2.2           plyr_1.8.4
 [9] stats4_3.4.1       ggplot2_2.2.1      colourpicker_1.0   rlang_0.1.2
[13] lazyeval_0.2.1     minqa_1.2.4        miniUI_0.1.1       nloptr_1.0.4
[17] DT_0.2             shinythemes_1.1.1  splines_3.4.1      shinyjs_1.0
[21] lme4_1.1-14        stringr_1.2.0      htmlwidgets_0.9    loo_1.1.0
[25] igraph_1.1.2       munsell_0.4.3      shiny_1.0.5        compiler_3.4.1
[29] httpuv_1.3.5       rstan_2.17.2       pkgconfig_2.0.1    base64enc_0.1-3
[33] rstantools_1.4.0   htmltools_0.3.6    tibble_1.3.4       gridExtra_2.3
[37] codetools_0.2-15   matrixStats_0.52.2 threejs_0.3.1      dplyr_0.7.4
[41] MASS_7.3-47        grid_3.4.1         nlme_3.1-131       xtable_1.8-2
[45] gtable_0.2.0       magrittr_1.5       StanHeaders_2.17.1 scales_0.5.0
[49] reshape2_1.4.2     bindrcpp_0.2       fbr_1.19.14        dygraphs_1.1.1.4
[53] xts_0.10-0         tools_3.4.1        glue_1.2.0         shinystan_2.4.0
[57] markdown_0.8       crosstalk_1.0.0    survival_2.41-3    rsconnect_0.8.5
[61] inline_0.3.14      colorspace_1.3-2   bayesplot_1.4.0    bindr_0.1

cross-reported as github issue: https://github.com/stan-dev/rstanarm/issues/254


Solution

  • This occurs due to the fact that you have a rogue argument in your call to stan_lmer that is evaluated positionally:

    mod1 <- rstanarm::stan_lmer(formula= y_ij ~ x3 + (x1 + x2 | grps), data= dt, 
    gaussian(link = "identity"), # this is interpreted as subset, not family!
    prior= normal(), prior_intercept= normal(), chains= 4, cores= 4, seed= 134L)
    

    The stan_lmer function does not have a family argument because the likelihood is implicitly hard-coded as family = gaussian(link = "identity"). Rather, the third argument to stan_lmer is subset and it throws the error you see from deep in the formula parsing code when it tries to interpret the list produced by gaussian as a logical vector for which observations to include (the error message is not great, but it comes from stats::model.frame rather than anything in the rstanarm namespace so there is nothing I can do about it).

    To fix this problem, just delete gaussian(link = "identity") from the call to stan_lmer.