I would like to write in Julia the following toy Bernoulli model written in JAGS:
model{
p0 ~ dbeta(1,1)
for(i in 1:n){
y[i] ~ dbern(p0)
}
}
Here's the complete driver
true_p0 <- 0.2
n <- 600
y <- sample(c(0,1), size=n, prob = c(1-true_p0, true_p0), replace=TRUE)
y[500:600] = NA # Some observations are missing!
data=list(y=y, n=n)
n_iter <- 1000
out<-jags.parallel(data=data, inits=NULL, n.iter=1000,
parameters.to.save = c('p0', 'y'),
model.file = 'basic_missing.bug',
n.chains = chain_count)
# The missing values are treated as parameters, so we can extract them:
print(sum(out$BUGSoutput$sims.list$y[,550])/nrow(out$BUGSoutput$sims.list$y))
# > [1] 0.232375
# The non-missing values are treated as constants, so even if we extract them, there is no variability:
print(sum(out$BUGSoutput$sims.list$y[,200])/nrow(out$BUGSoutput$sims.list$y))
# [1] 0
Here's my Julia model:
using Random
using Distributions
using Turing
using StatsPlots
using Plots
Random.seed!(3)
true_p0=0.2
v=cat(rand(Bernoulli(true_p0), 500), [missing for _ in 1:100], dims=1)
@model function bernoulli_model(v)
p0 ~ Beta(1,1)
v .~ Bernoulli(p0)
end
model=bernoulli_model(v)
nsamples = 1000
nchains = 8
sampler = MH()
chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = 200, thinning=50);
> TaskFailedException
>
> nested task error: TaskFailedException
>
> nested task error: DomainError with 0.0:
> Beta: the condition α > zero(α) is not satisfied.
What do I do wrong?
I figured out the answer myself.
There are two issues here:
Bernoulli(p0)
expects the values to be integers, and throws a cryptic error when they are booleans. So the first thing to do is to convert the input vector to int by e.g. multiplying the bool values by 1.v .~ Bernoulli(p0)
, a vectorized (aka broadcasting) operator "~" is a wrong construct here, probably because the function dispatching happens only once, but we are talking about two completely different operations, that are serviced by two completely different overloads of operator ~ over the Bernoulli distribution. I suspect that one variant is a simple sampling variant that just edits the target log-likelihood, which is applied when "v[i]" is known. The other one is the variant that defines a latent random variable that represents the missing data, which is applied when the "v[i]" is missing. In order to force Julia to dispatch a different operator ~ overload dynamically based on the type of the argument, we need to use the actual loop.Here's the corrected code:
using Random
using Distributions
using Turing
Random.seed!(3)
true_p0=0.2
v=cat(rand(Bernoulli(true_p0), 500), [missing for _ in 1:100], dims=1)
@model function bernoulli_model(v)
p0 ~ Beta(1,1)
for i in eachindex(v)
v[i] ~ Bernoulli(p0)
end
end
model=bernoulli_model(v)model=bernoulli_model(v)
nsamples = 1000
nchains = 8
sampler = MH()
chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = 200, thinning=50);