Search code examples
rmissing-databayesianrjagsrunjags

JAGS and missing data: Non-imputation solutions?


I am trying to fit a model with an interaction term using the rjags and runjags packages. I encountered the below reproduced error message RUNTIME ERROR: Compilation error on line 4. Possible directed cycle involving ymean and I believe this is due to the presence of missing values (NA) in my variables.

The goal of my current project is to compare rjags models that do and do not use multiple imputation, so I will conduct multiple imputation eventually, but I had assumed that the model would proceed with some default technique like casewise deletion with lm(), but I'm just ending up with an error message. Is there a way to force casewise deletion or another non-imputation approach within the model itself without data wrangling beforehand?

library(rjags)
#> Loading required package: coda
#> Linked to JAGS 4.3.1
#> Loaded modules: basemod,bugs
library(runjags)
library(coda)
data(mtcars)
data(airquality)

# If I use this, it'll actually work because `mtcars` has no missing values
# y <- mtcars$mpg
# x1 <- mtcars$disp
# x2 <- mtcars$drat
# nn <- length(y)

# But this one with missing values causes the error message
y <- airquality$Solar.R
x1 <- airquality$Wind
x2 <- airquality$Temp
nn <- length(y)

dataList = list(y = y, x1 = x1, x2 = x2, Ntotal = nn)

modelString = "
data {
  ymean <- mean(y); ysd <- sd(y); x1mean <- mean(x1); x1sd <- sd(x1); x2mean <- mean(x2); x2sd <- sd(x2)
  for (i in 1:Ntotal){
    zy[i] <- (y[i] - ymean)/ysd
    z1[i] <- (x1[i] - x1mean)/x1sd
    z2[i] <- (x2[i] - x2mean)/x2sd
  }
}
model{
  for (i in 1:Ntotal){
    zy[i] ~ dt(zbeta0 + zbeta1*z1[i] + zbeta2*z2[i] + zbetaInt*z1[i]*z2[i], 1/zsigma^2, nu)
  }
  
  zbeta0 ~ dnorm(0,1/2^2)
  zbeta1 ~ dnorm(0,1/2^2) 
  zbeta2 ~ dnorm(0,1/2^2) 
  zbetaInt ~ dnorm(0,1/2^2) 
  zsigma ~ dunif(0.00001, .99999)
  nu ~ dexp(.0333)
  sigma <- zsigma*ysd
}
"
writeLines(modelString, con = "modelString.txt")

myinits <- list(
  list(zbeta0 = rnorm(1,0,1), zbeta1 = rnorm(1,0,1), zbeta2 = rnorm(1,0,1), zbetaInt = rnorm(1,0,1), zsigma = runif(1), nu = runif(1)),
  list(zbeta0 = rnorm(1,0,1), zbeta1 = rnorm(1,0,1), zbeta2 = rnorm(1,0,1), zbetaInt = rnorm(1,0,1), zsigma = runif(1), nu = runif(1)),
  list(zbeta0 = rnorm(1,0,1), zbeta1 = rnorm(1,0,1), zbeta2 = rnorm(1,0,1), zbetaInt = rnorm(1,0,1), zsigma = runif(1), nu = runif(1)))

out <- run.jags(model="modelString.txt", 
                data = dataList, inits = myinits, n.chains = 3,
                adapt = 500, burnin = 2000, sample = 15000, 
                monitor = c("zbeta0", "zbeta1", "zbeta2", "zbetaInt", "sigma", "nu", "DIC"))
#> module dic loaded
#> Compiling rjags model...
#> Error: The following error occured when compiling and adapting the model using rjags:
#>  Error in rjags::jags.model(model, data = dataenv, inits = inits, n.chains = length(runjags.object$end.state),  : 
#>   RUNTIME ERROR:
#> Compilation error on line 4.
#> Possible directed cycle involving ymean
#> 
#> 
#> 
#> It may help to use failed.jags(c('model','data','inits')) to see model/data/inits syntax with line numbers
print(out)
#> Error in print(out): object 'out' not found

Solution

  • No, there is no way to get JAGS to do case-wise deletion of missing values or an argument to run.jags to do anything like that. The best I can suggest is to do something like:

    dataList <- data.frame(y = y, x1 = x1, x2 = x2)
    out <- run.jags(model= modelString, data = na.omit(dataList), ...)
    

    Then you will have to replace 'for (i in 1:Ntotal){' with 'for (i in 1:length(y)){' in your model and data block.

    However, one advantage of JAGS is that imputation is usually handled automatically within the model as long as the missing data is modelled as the response in a stochastic relationship. In your case you are modelling a function of y instead which is why you have this problem, but if you can reformulate the model so that y (rather than zy) is the response then you would not have a problem. Although of course adding data where the response is missing actually makes no difference to the model inference in any case, so I am not entirely sure why you want to do this.