I'm following along with this official IRT w/ STAN tutorial. The details of the model are copied below:
data {
int<lower=1> J; // number of students
int<lower=1> K; // number of questions
int<lower=1> N; // number of observations
int<lower=1,upper=J> jj[N]; // student for observation n
int<lower=1,upper=K> kk[N]; // question for observation n
int<lower=0,upper=1> y[N]; // correctness for observation n
}
parameters {
real delta; // mean student ability
real alpha[J]; // ability of student j - mean ability
real beta[K]; // difficulty of question k
}
model {
alpha ~ std_normal(); // informative true prior
beta ~ std_normal(); // informative true prior
delta ~ normal(0.75, 1); // informative true prior
for (n in 1:N)
y[n] ~ bernoulli_logit(alpha[jj[n]] - beta[kk[n]] + delta);
}
I'm not certain which variables do and do not need to be declared in R code.
toy_data <- list(
J= 5,
K = 4,
N =20,
y= c(1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,0,0,0,0,0)
)
fit <- stan(file = '1PL_stan.stan', data = toy_data)
However, the following error is triggered.
Error in mod$fit_ptr() :
Exception: variable does not exist; processing stage=data initialization; variable name=jj; base type=int (in 'model920c4330dff_1PL_stan' at line 5)
In addition: Warning messages:
1: In readLines(file, warn = TRUE) :
incomplete final line found on 'C:\Users\jacob.moore\Downloads\1PL_stan.stan'
2: In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
'-E' not found
failed to create the sampler; sampling not done
In my past work, I've used python almost exclusively. So learning R has been quite the learning curve; additionally, I'm very new to STAN, hence the toy example.
The core idea is that there are 20 child/question pairings. 5 children and 4 different questions. I'm uncertain why my code is triggering the error, and what I should do to correct it. Can you clarify what needs adjustment for this code to run without triggering an error?
Every parameter listed in the data block (J
, K
, N
, jj
, kk
, and y
) needs to be included in the variable toy_data
. You've left out jj
and kk
.
You have 5 students (J=5
) answering 4 questions each (K=4
). jj
is the student ID, and kk
is the question ID, so assuming your responses are ordered by student and then by question, you would have something like
jj = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5)
kk = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)