why stan sampling do not match theoretical values?

I'm learning stan, and just tried a very simple model (bernoulli) like below, which I expect the posterior sampling to give a mean value of 0.3, because the prior is just a uniform distribution, but stan actually gives a mean value of 0.33. What is going on here?

By the way, I tried "optimizing" that gives 0.3, which is what I expected.

Thanks for your help!

model_code = "
data {
  int N;
  int y[N];
}

parameters {
  real theta;
}

model {
  theta ~ uniform(0, 1);
  y ~ bernoulli(theta);
}

"

data <- list(
  N = 10, 
  y = c(0, 1, 1, 0, 0, 1, 0, 0, 0, 0)
)

fit = stan(model_code=model_code, data=data, iter=5000)
print(fit)

model = stan_model(model_code=model_code)
mle = optimizing(model, data=data)
print(mle, digits=3)

> print(fit)
...
       mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
theta  0.33    0.00 0.13  0.11  0.24  0.32  0.42  0.61  6920    1
lp__  -6.56    0.01 0.63 -8.32 -6.71 -6.31 -6.15 -6.11  6813    1


> print(mle, digits=3)
$par
theta 
  0.3 
...

Solution

This is the example I use in my intro Stan classes to explain exactly why Stan does match the theory. The short answer is that the beta distribution is skewed, so the mean doesn't match the mode.

With a uniform prior, that's the same as theta ~ beta(theta | 1, 1). With 3 successes and 7 failures as observations, the analytic posterior is beta(theta | 4, 8). The mode (optimum) is 3/10, whereas the mean is 4/12. Optimization gives you the right posterior mode estimate of 0.30 and sampling gives the correct posterior mean estimate of 0.33.

As the number of success (a) and failure (b) observations increase (i.e., a, b -> infinity), the posterior mode a / (a + b) and mean (a + 1) / (a + b + 2) approach the same limit, whihc is he empirical ratio a / (a + b). Until that limit, taking the mean provides an estimate with lower expected square error than taking the mode.

See: https://en.wikipedia.org/wiki/Beta_distribution