Search code examples
rcomparisonglmbayesianrstan

How to calculate differences between the coefficients (categorical variables) of glm procedure in Rstan


I have a question about the how to calculate differences between the coefficients (categorical variables) of glm in Rstan.

As example, I used iris dataset in R to judge whether I can calculate the posterior distribution of differences of coefficients.

At first, I conducted a basic glm procedure like below and calculate the significant differences of coefficients.

library(tidyverse)
library(magrittr)
library(multcomp)

iris_glm <-
glm(Sepal.Length ~ Species, data = iris) 

multcomp::glht(iris_glm, linfct = mcp(Species = "Tukey")) %>% 
summary(.) %>% 
broom::tidy() 

                     lhs rhs estimate std.error statistic      p.value
1    versicolor - setosa   0    0.930 0.1029579  9.032819 0.000000e+00
2     virginica - setosa   0    1.582 0.1029579 15.365506 0.000000e+00
3 virginica - versicolor   0    0.652 0.1029579  6.332686 4.294805e-10

Next, I conducted bayesian glm procedure using stan like below code, and calculate the posterior distribution of the differences between coefficients in generated quantities section.

# Make the model matrix for Rstan
iris_mod <-
model.matrix(Sepal.Length ~ Species, data = iris) %>%
as.data.frame(.)

# Input data
stan_data <-
list(N = nrow(iris_mod),
SL = iris$Sepal.Length,
Intercept = iris_mod$`(Intercept)`,
versicolor = iris_mod$Speciesversicolor,
virginica = iris_mod$Speciesvirginica)

# Stan code

data{
int N;
real <lower = 0> SL[N];
int <lower = 1> Intercept[N];
int <lower = 0, upper = 1> versicolor[N];
int <lower = 0, upper = 1> virginica[N];
}

parameters{
real beta0;
real beta1;
real beta2;
real <lower = 0> sigma;
}

transformed parameters{
real mu[N];
for(n in 1:N) mu[n] = beta0*Intercept[n] + beta1*versicolor[n] +       
beta2*virginica[n];
}

model{
for(n in 1:N) SL[n] ~ normal(mu[n], sigma);
}

generated quantities{
real diff_beta0_beta1;
real diff_beta1_beta2;
real diff_beta0_beta2;

diff_beta0_beta1 = (beta0 + beta1) - beta0;
diff_beta1_beta2 = (beta0 + beta1) - (beta0 + beta2);
diff_beta0_beta2 = (beta0 + beta2) - beta0;
}


library(rstan)
fit_stan <- 
stan(file = "iris.stan", data = stan_data, chains = 4, 
seed = 1234)

# confirmation of posterior distribution
print(fit_stan, pars = c("diff_beta0_beta1", "diff_beta1_beta2", 
"diff_beta0_beta2"))

                  mean se_mean  sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
diff_beta0_beta1 0.92       0 0.1 0.73 0.86 0.92 0.99  1.13  2041    1
diff_beta1_beta2 0.65       0 0.1 0.45 0.58 0.65 0.72  0.86  4000    1
diff_beta0_beta2 1.58       0 0.1 1.38 1.51 1.58 1.64  1.78  1851    1

Finally, I could get same results between the frequentist method and bayesian method. I think this is correct way, but I'm not sure this because there are no information nor examples. Also I also confirm this way could be extended another error distributions (including, poisson, gamma, binomial, negative- binomial etc.).

If there are another good ways or advices, please teach me.


Solution

  • You can calculate any function (including the difference in coefficients) of draws (such as those produced by Stan) from any proper posterior distribution.