I'm trying to run a three-level nested linear model in Rstan but keep receiving an error message.
I've taken inspiration from this three-level nested model:
http://rstudio-pubs-static.s3.amazonaws.com/64315_bc3a395edd104095a8384db8d9952f43.html
In essence I have repeated measures data for average broadband speed from four separate years (a total of 696 observations), nested within 174 local authorities, finally nested within 11 regions.
Here's the data:
yrid laid rnid speed
1 1 1 2.3
2 1 1 8.4
3 1 1 9.2
4 1 1 5.8
5 2 1 1.5
6 2 1 9.8
7 2 1 4.7
Code:
#load and prepare data
total<-read.csv("total.csv")
names(total)[names(total)=="X"] <- "yrid"
names(total)[names(total)=="lad.num"] <- "laid"
names(total)[names(total)=="region"] <- "rnid"
## Sort by laid for easier handling in Stan
total <- arrange(total, laid, rnid)
## Create a vector of school IDs where j-th element gives school ID
for class ID j
regionLookupVec <- unique(total[c("laid","rnid")])[,"rnid"]
## Combine as a stan dataset
Ni <- length(unique(total$yrid))
Nj <- length(unique(total$laid))
Nk <- length(unique(total$rnid))
laid <- (total$laid)
rnid <- (total$rnid)
regionLookup <- (regionLookupVec)
speed <- (total$sp)
## Combine as a stan dataset
dat <- (list(Ni = length(unique(Ni)),
Nj = length(unique(Nj)),
Nk = length(unique(Nk)),
laid = laid,
rnid = rnid,
regionLookup = regionLookupVec,
speed = speed))
#load model
stanmodelcode <- "data {
##Define variables in data
##Number of level-1 observations (an integer)
int<lower=0> Ni;
## Number of level-2 clusters
int<lower=0> Nj;
## Number of level-3 clusters
int<lower=0> Nk;
## Cluster IDs
int<lower=1> laid[Ni];
int<lower=1> rnid[Ni];
## Level 3 look up vector for level 2
int<lower=1> regionLookup[Nj];
## Continuous outcome
real speed[Ni];
## Continuous predictor
## real X_1ijk[Ni];
}
parameters {
## Define parameters to estimate
## Population intercept (a real number)
real beta_0;
## Population slope
## real beta_1;
## Level-1 errors
real<lower=0> sigma_e0;
## Level-2 random effect
real u_0jk[Nj];
real<lower=0> sigma_u0jk;
## Level-3 random effect
real u_0k[Nk];
real<lower=0> sigma_u0k;
}
transformed parameters {
## Varying intercepts
real beta_0jk[Nj];
real beta_0k[Nk];
## Individual mean
real mu[Ni];
## Varying intercepts definition
## Level-3 (level-3 random intercepts)
for (k in 1:Nk) {
beta_0k[k] <- beta_0 + u_0k[k];
}
## Level-2 (level-2 random intercepts)
for (j in 1:Nj) {
beta_0jk[j] <- beta_0k[regionLookup[j]] + u_0jk[j];
}
## Individual mean
for (i in 1:Ni) {
mu[i] <- beta_0jk[laid[i]];
}
}
model {
## Prior part of Bayesian inference
## Flat prior for mu (no need to specify if non-informative)
## Random effects distribution
u_0k ~ normal(0, sigma_u0k);
u_0jk ~ normal(0, sigma_u0jk);
## Likelihood part of Bayesian inference
## Outcome model N(mu, sigma^2) (use SD rather than Var)
for (i in 1:Ni) {
speed[i] ~ normal(mu[i], sigma_e0);
}
}"
#run model
resStan <-stan(model_code = stanmodelcode, data=dat,
iter=3000, chains=3, warmup=500, thin=10)}
Error:
COMPILING THE C++ CODE FOR MODEL 'stanmodelcode' NOW.
Error : mismatch in dimension declared and found in context;
processing stage=data initialization; variable name=laid; position=0;
dims declared=(1); dims found=(696) failed to create the sampler;
sampling not done
The error message says that it expects the dimension of the laid
variable to be 1, but the data actually passed to stan has a dimension of 696.
In your code you have
dat <- (list(Ni = length(unique(Ni)),
which says: assign Ni the value of the length of the unique values in Ni. What is Ni? You have
Ni <- length(unique(total$yrid))
Which I believe is 696. Therefore Ni = length(unique(696))
gives Ni = 1
, which is the source of the error.
Try changing your dat
variable to:
dat <- (list(Ni = Ni,
Nj = Nj,
Nk = Nk,
laid = laid,
rnid = rnid,
regionLookup = regionLookupVec,
speed = speed))