I am trying to code a custom Probit function in Stan to improve my understanding of the Stan language and likelihoods. So far I've written the logarithm of the normal pdf but am receiving an error message that I've found to be unintelligible when I am trying to write the likelihood. What am I doing wrong?
Stan model
functions {
real normal_lpdf(real mu, real sigma) {
return -log(2 * pi()) / 2 - log(sigma)
- square(mu) / (2 * sigma^2);
}
real myprobit_lpdf(int y | real mu, real sigma) {
return normal_lpdf(mu, sigma)^y * (1 - normal_lpdf(mu, sigma))^(1-y);
}
}
data {
int N;
int y[N];
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
for (n in 1:N) {
target += myprobit_lpdf(y[n] | mu, sigma);
}
}
Error
PARSER EXPECTED: Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, : failed to parse Stan model 'Probit_lpdf' due to the above error.
R code to simulate data
## DESCRIPTION
# testing a Probit model
## DATA
N <- 2000
sigma <- 1
mu <- 0.3
u <- rnorm(N, 0, 2)
y.star <- rnorm(N, mu, sigma)
y <- ifelse(y.star > 0,1, 0)
data = list(
N = N,
y = y
)
## MODEL
out.stan <- stan("Probit_lpdf.stan",data = data, chains = 2, iter = 1000 )
The full error message is
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
error in 'model2a7252aef8cf_probit' at line 7, column 27
-------------------------------------------------
5: }
6: real myprobit_lpdf(real y, real mu, real sigma) {
7: return normal_lpdf(mu, sigma)^y * (1 - normal_lpdf(mu, sigma))^(1-y);
^
8: }
-------------------------------------------------
which is telling you that the normal_lpdf
function excepts three inputs and a vertical bar separating the first from the second.
It is also not a good idea to give your function the same name as a function that is already in the Stan language, such as normal_lpdf
.
But the functions you have written do not implement the log-likelihood of a probit model anyway. First, the standard deviation of the errors is not identified by the data, so you do not need sigma
. Then, the correct expressions would be something like
real Phi_mu = Phi(mu);
real log_Phi_mu = log(Phi_mu);
real log1m_Phi_mu = log1m(Phi_mu);
for (n in 1:N)
target += y[n] == 1 ? log_Phi_mu : log1m_Phi_mu;
although that is just a slow way of doing
target += bernoulli_lpmf(y | Phi(mu));