Search code examples
rbayesianjagsr2jags

Latent variable estimation with binary and continuous indicators in JAGS


I am using R2jags to estimate latent factor using both binary and continuous variables. I was wondering wether jags command from R2jags can handle both binary and continuous variables at the same time. I tried to find articles explain the types of variables jags can handle, but I was not able to find any. If jags can account both binary and continuous in the same model, what is the correct way to specify the types of variables when creating a model?

Thanks for your help in advance.


Solution

  • Here's an example. First, I'll make some data:

    library(R2jags)
    set.seed(207)
    z <- rnorm(250)
    y_num <- apply(matrix(rnorm(750,0, .25), ncol=3), 2, \(x)x+z)
    y_bin <- apply(matrix(rnorm(750,0, .25), ncol=3), 2, \(x)plogis(x+z))
    y_bin <- apply(y_bin, 2, \(x)rbinom(length(x), 1, x))
    
    y_num <- scale(y_num)
    

    Next, we put the data in a list to feed to JAGS:

    datl <- list(
      N = nrow(y_num),
      y_num = y_num, 
      y_bin = y_bin, 
      a0=c(0,0), 
      A0 = diag(2)*.1
    )
    
    

    Next, we can write the model. Here, we are using a Bernoulli distribution to fit the binary variables and a normal distribution to fit the continuous variables (Note, the code below will write the model code to a file in your working directory):

    jags.mod <- "model{
      for(i in 1:N){
        for(j in 1:3){
          y_num[i,j] ~ dnorm(mu[i,j], tau[j])
          mu[i,j] <- b[j]*zeta[i]
        }
        for(j in 1:3){
          y_bin[i,j] ~ dbern(p[i,j])
          logit(p[i,j]) <- a[1,j] + a[2,j]*zeta[i]
        }
      zeta[i] ~ dnorm(0,1)
      }
      b[1] <- 1
      b[2] ~ dnorm(0,.1)T(0, )
      b[3] ~ dnorm(0,.1)T(0, )
      for(j in 1:3){
        a[1:2, j] ~ dmnorm(a0[1:2], A0[1:2,1:2])
        tau[j] ~ dt(0,.1, 1)T(0,)
      }
    }"
    cat(jags.mod, file="jags_model.txt")
    

    Now we can run the model:

    jgs <- jags(datl, 
                inits=NULL, 
                parameters.to.save = c("a", "b", "tau", "zeta"), 
                model.file = "jags_model.txt")
    #> module glm loaded
    #> Compiling model graph
    #>    Resolving undeclared variables
    #>    Allocating nodes
    #> Graph information:
    #>    Observed stochastic nodes: 1500
    #>    Unobserved stochastic nodes: 258
    #>    Total graph size: 4776
    #> 
    #> Initializing model
    

    Finally, since we made the data, we know the true latent variable, so we can evaluate the similarity between the estimated latent variable and the true latent variable:

    zeta_hat <- jgs$BUGSoutput$summary[,1]
    zeta_hat <- zeta_hat[grep("zeta", names(zeta_hat))]
    # correlate estimated latent variable (zeta_hat) with true latent variable (z). 
    cor(zeta_hat, z)
    #> [1] 0.99146
    

    Created on 2022-10-19 by the reprex package (v2.0.1)