I was wondering if it might be possible (and perhaps recommended) to obtain standardized coefficients from stan_glm()
in the rstanarm
package? (did not find anything specific in the documentation)
Can I just standardize all variables as in normal regression? (see below)
Example:
library("rstanarm")
fit <- stan_glm(wt ~ vs*gear, data = mtcars)
Standardization:
design <- wt ~ vs*gear
vars <- all.vars(design)
stand.vars <- lapply(mtcars[, vars], scale)
fit <- stan_glm(stand.vars, data = mtcars)
I would not say that it is affirmatively recommended, but I would recommend that you not subtract the sample mean and divide by the sample standard deviation of the outcome because the estimation uncertainty in those two statistics will not be propagated to the posterior distribution.
Standardizing the predictors is more debatable. You can do it, but it makes doing posterior prediction with new data harder because you have to remember to subtract the old means from the new data and divide by the old standard deviations.
The most computationally efficient approach is to leave the variables as they are but specify the non-default argument QR = TRUE
, especially if you are not going to modify the default (normal) priors on the coefficients anyway.
You can then standardize the posterior coefficients after-the-fact if standardized coefficients are of interest. To do so, you can do
X <- model.matrix(fit)
sd_X <- apply(X, MARGIN = 2, FUN = sd)[-1]
sd_Y <- apply(posterior_predict(fit), MARGIN = 1, FUN = sd)
beta <- as.matrix(fit)[ , 2:ncol(X), drop = FALSE]
b <- sweep(sweep(beta, MARGIN = 2, STATS = sd_X, FUN = `*`),
MARGIN = 1, STATS = sd_Y, FUN = `/`)
summary(b)
However, standardizing regression coefficients just gives the illusion of comparability across variables and says nothing about how germane a one standard deviation difference is, particularly for dummy variables. If your question is really whether manipulating this predictor or that predictor is going to make a bigger difference on the outcome variable, then simply simulate those manipulations like
PPD_0 <- posterior_predict(fit)
nd <- model.frame(fit)
nd[ , 2] <- nd[ , 2] + 1 # for example
PPD_1 <- posterior_predict(fit, newdata = nd)
summary(c(PPD_1 - PPD_0))
and repeat that process for other manipulations of interest.