I have a question about the how to calculate differences between the coefficients (categorical variables) of glm in Rstan.
As example, I used iris dataset in R to judge whether I can calculate the posterior distribution of differences of coefficients.
At first, I conducted a basic glm procedure like below and calculate the significant differences of coefficients.
library(tidyverse)
library(magrittr)
library(multcomp)
iris_glm <-
glm(Sepal.Length ~ Species, data = iris)
multcomp::glht(iris_glm, linfct = mcp(Species = "Tukey")) %>%
summary(.) %>%
broom::tidy()
lhs rhs estimate std.error statistic p.value
1 versicolor - setosa 0 0.930 0.1029579 9.032819 0.000000e+00
2 virginica - setosa 0 1.582 0.1029579 15.365506 0.000000e+00
3 virginica - versicolor 0 0.652 0.1029579 6.332686 4.294805e-10
Next, I conducted bayesian glm procedure using stan like below code, and calculate the posterior distribution of the differences between coefficients in generated quantities section.
# Make the model matrix for Rstan
iris_mod <-
model.matrix(Sepal.Length ~ Species, data = iris) %>%
as.data.frame(.)
# Input data
stan_data <-
list(N = nrow(iris_mod),
SL = iris$Sepal.Length,
Intercept = iris_mod$`(Intercept)`,
versicolor = iris_mod$Speciesversicolor,
virginica = iris_mod$Speciesvirginica)
# Stan code
data{
int N;
real <lower = 0> SL[N];
int <lower = 1> Intercept[N];
int <lower = 0, upper = 1> versicolor[N];
int <lower = 0, upper = 1> virginica[N];
}
parameters{
real beta0;
real beta1;
real beta2;
real <lower = 0> sigma;
}
transformed parameters{
real mu[N];
for(n in 1:N) mu[n] = beta0*Intercept[n] + beta1*versicolor[n] +
beta2*virginica[n];
}
model{
for(n in 1:N) SL[n] ~ normal(mu[n], sigma);
}
generated quantities{
real diff_beta0_beta1;
real diff_beta1_beta2;
real diff_beta0_beta2;
diff_beta0_beta1 = (beta0 + beta1) - beta0;
diff_beta1_beta2 = (beta0 + beta1) - (beta0 + beta2);
diff_beta0_beta2 = (beta0 + beta2) - beta0;
}
library(rstan)
fit_stan <-
stan(file = "iris.stan", data = stan_data, chains = 4,
seed = 1234)
# confirmation of posterior distribution
print(fit_stan, pars = c("diff_beta0_beta1", "diff_beta1_beta2",
"diff_beta0_beta2"))
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
diff_beta0_beta1 0.92 0 0.1 0.73 0.86 0.92 0.99 1.13 2041 1
diff_beta1_beta2 0.65 0 0.1 0.45 0.58 0.65 0.72 0.86 4000 1
diff_beta0_beta2 1.58 0 0.1 1.38 1.51 1.58 1.64 1.78 1851 1
Finally, I could get same results between the frequentist method and bayesian method. I think this is correct way, but I'm not sure this because there are no information nor examples. Also I also confirm this way could be extended another error distributions (including, poisson, gamma, binomial, negative- binomial etc.).
If there are another good ways or advices, please teach me.
You can calculate any function (including the difference in coefficients) of draws (such as those produced by Stan) from any proper posterior distribution.