Search code examples
rbayesianstan

STAN IRT via R programming, issue with parameter declaration?


I'm following along with this official IRT w/ STAN tutorial. The details of the model are copied below:

data {
  int<lower=1> J;              // number of students
  int<lower=1> K;              // number of questions
  int<lower=1> N;              // number of observations
  int<lower=1,upper=J> jj[N];  // student for observation n
  int<lower=1,upper=K> kk[N];  // question for observation n
  int<lower=0,upper=1> y[N];   // correctness for observation n
}

parameters {
  real delta;         // mean student ability
  real alpha[J];      // ability of student j - mean ability
  real beta[K];       // difficulty of question k
}

model {
  alpha ~ std_normal();         // informative true prior
  beta ~ std_normal();          // informative true prior
  delta ~ normal(0.75, 1);      // informative true prior
  for (n in 1:N)
    y[n] ~ bernoulli_logit(alpha[jj[n]] - beta[kk[n]] + delta);
}

I'm not certain which variables do and do not need to be declared in R code.

toy_data <- list(
  J= 5,
  K = 4,
  N =20,
  y= c(1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,0,0,0,0,0)
                )
fit <- stan(file = '1PL_stan.stan', data = toy_data)

However, the following error is triggered.

Error in mod$fit_ptr() : 
  Exception: variable does not exist; processing stage=data initialization; variable name=jj; base type=int  (in 'model920c4330dff_1PL_stan' at line 5)

In addition: Warning messages:
1: In readLines(file, warn = TRUE) :
  incomplete final line found on 'C:\Users\jacob.moore\Downloads\1PL_stan.stan'
2: In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
  '-E' not found
failed to create the sampler; sampling not done

In my past work, I've used python almost exclusively. So learning R has been quite the learning curve; additionally, I'm very new to STAN, hence the toy example.

The core idea is that there are 20 child/question pairings. 5 children and 4 different questions. I'm uncertain why my code is triggering the error, and what I should do to correct it. Can you clarify what needs adjustment for this code to run without triggering an error?


Solution

  • Every parameter listed in the data block (J, K, N, jj, kk, and y) needs to be included in the variable toy_data. You've left out jj and kk.

    You have 5 students (J=5) answering 4 questions each (K=4). jj is the student ID, and kk is the question ID, so assuming your responses are ordered by student and then by question, you would have something like

    jj = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5)
    kk = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)