Search code examples
rstanrstan

Specifying prior distribution on matrix in rstan


I am having trouble with getting a Bayesian mixed-effects model to yield stationary and well-mixed chains. I have created my own data so I know what parameters should be retrieved by the model. Unfortunately because the effective number of parameters is so low and the Rhat so high the parameter estimates are complete nonsense.

The data is designed so there are 60 subjects, split into three groups (g1, g2, g3) of 20 subjects each. Each subject is exposed to 3 conditions (cond1, cond2, cond3). I designed the data so there is no difference among the groups, but there are differences among the conditions, with cond1 scoring 100 on average, cond2 scoring 75 on average, and cond3 scoring 125.

df <- data.frame(id = factor(rep(1:60, 3)),
                 group = factor(rep(c("g1", "g2", "g3"), each = 20, length.out = 180)),
                 condition = factor(rep(c("cond1", "cond2", "cond3"), each = 60)),
                 score = c(ceiling(rnorm(60, 100, 15)), ceiling(rnorm(60, 75, 15)), ceiling(rnorm(60, 125, 15))))

Here are the descriptives

library(dplyr)
df %>% group_by(group, condition) %>% summarise(m = mean(score), sd = sd(score))

# group condition     m    sd
# <fct> <fct>     <dbl> <dbl>
# 1 g1    cond1     108    12.4
# 2 g1    cond2      79.4  13.1
# 3 g1    cond3     128    11.5
# 4 g2    cond1     105    15.5
# 5 g2    cond2      71.6  10.6
# 6 g2    cond3     127    17.7
# 7 g3    cond1     106    13.3
# 8 g3    cond2      75.8  17.6
# 9 g3    cond3     124    14.5

Everything looks to be correct, the differences between conditions are preserved nicely across groups.

Now for the the model. The model I am running has a grand mean, a parameter for group, a parameter for condition, a parameter for the group x condition interaction, and a subject parameter.

Here is the data list

##### Step 1: put data into a list
mixList <- list(N = nrow(df),
                nSubj = nlevels(df$id),
                nGroup = nlevels(df$group),
                nCond = nlevels(df$condition),
                nGxC = nlevels(df$group)*nlevels(df$condition),
                sIndex = as.integer(df$id),
                gIndex = as.integer(df$group),
                cIndex = as.integer(df$condition),
                score = df$score)

Now to build the model in rstan, saving the string as a .stan file using the cat() function

###### Step 2: build model
cat("
    data{
    int<lower=1> N;
    int<lower=1> nSubj;
    int<lower=1> nGroup;
    int<lower=1> nCond;
    int<lower=1,upper=nSubj> sIndex[N];
    int<lower=1,upper=nGroup> gIndex[N];
    int<lower=1,upper=nCond> cIndex[N];
    real score[N];
    }

    parameters{
    real a0;
    vector[nGroup] bGroup;
    vector[nCond] bCond;
    vector[nSubj] bSubj;
    matrix[nGroup,nCond] bGxC;
    real<lower=0> sigma_s;
    real<lower=0> sigma_g;
    real<lower=0> sigma_c;
    real<lower=0> sigma_gc;
    real<lower=0> sigma;
    }

    model{
    vector[N] mu;
    bCond ~ normal(100, sigma_c);
    bGroup ~ normal(100, sigma_g);
    bSubj ~ normal(0, sigma_s);
    sigma ~ cauchy(0,2)T[0,];
    for (i in 1:N){
    mu[i] = a0 + bGroup[gIndex[i]] + bCond[cIndex[i]] + bSubj[sIndex[i]] + bGxC[gIndex[i],cIndex[i]];
    }
    score ~ normal(mu, sigma);
    }

    ", file = "mix.stan")

Next is to generate the chains in rstan

##### Step 3: generate the chains
mix <- stan(file = "mix.stan",
            data = mixList,
            iter = 2e3,
            warmup = 1e3,
            cores = 1,
            chains = 1)

And here is the output

###### Step 4: Diagnostics
print(mix, pars = c("a0", "bGroup", "bCond", "bGxC", "sigma"), probs = c(.025,.975))

#                mean se_mean      sd      2.5%    97.5% n_eff Rhat
# a0         -1917.21  776.69 2222.64  -5305.69  1918.58     8 1.02
# bGroup[1]   2368.36 2083.48 3819.06  -2784.04  9680.78     3 1.54
# bGroup[2]   7994.87  446.06 1506.31   4511.22 10611.46    11 1.00
# bGroup[3]   7020.78 2464.68 4376.83     81.18 14699.90     3 1.91
# bCond[1]   -3887.06  906.99 1883.45  -7681.24  -247.48     4 1.60
# bCond[2]    4588.50  676.28 1941.92   -594.56  7266.09     8 1.10
# bCond[3]      73.91 1970.28 3584.74  -5386.96  5585.99     3 2.13
# bGxC[1,1]   3544.02  799.91 1819.18  -1067.27  6327.68     5 1.26
# bGxC[1,2]  -4960.08 1942.57 3137.33 -10078.84   317.07     3 2.66
# bGxC[1,3]   -396.35  418.34 1276.44  -2865.39  2543.45     9 1.42
# bGxC[2,1]  -2085.90 1231.36 2439.58  -5769.81  3689.38     4 1.46
# bGxC[2,2] -10594.89 1206.58 2560.42 -14767.50 -5074.33     5 1.02
# bGxC[2,3]  -6024.75 2417.43 4407.09 -12002.87  4651.14     3 1.71
# bGxC[3,1]  -1111.81 1273.66 2853.08  -4843.38  5572.87     5 1.48
# bGxC[3,2]  -9616.85 2314.56 4020.02 -15775.40 -4262.64     3 2.98
# bGxC[3,3]  -5054.27  828.77 2245.68  -8666.01  -321.74     7 1.00
# sigma         13.81    0.14    0.74     12.36    15.17    27 1.00

The low number of effective samples and high Rhats tell me I am doing something terribly wrong here, but what?

Is it not specifying a prior on bGxC?

How does one specify a prior on a matrix?


Solution

  • Matrices are inefficient in Stan (see here). It's better to use a vector of vectors:

    vector[nCond] bGxC[nGroup];
    

    And to set a prior:

    for(i in 1:nGroup){
      bGxC[i] ~ normal(0, sigma_gc);
    }
    

    And:

    for (i in 1:N){
      mu[i] = a0 + bGroup[gIndex[i]] + bCond[cIndex[i]] + bSubj[sIndex[i]] + bGxC[gIndex[i]][cIndex[i]];
    }