I'm running a bunch of identical logistic regression models across a variety of different treatment conditions from an experiment, and I have a handful of outcomes, so I've written a simple bit of code that loops through those and creates a dataframe of results at the end.
I think my question is fairly straightforward so I'll be succinct: what is the best way to subsequently calculate average marginal effects (AME) and CIs using my same framework? Of course, if there's some completely alternative way of doing this that requires a change to my strategy, that's fine too. Ideally, at the end, my goal is to have one big dataframe with the logit coefficients and CIs alongside the AMEs and CIs.
library(tidyverse)
library(broom)
# Simulate some data
set.seed(123)
n_obs <- 500
d <- data.frame(
white = sample(c(0, 1), n_obs, replace = TRUE), # binary indicator
gender = sample(c("male", "female"), n_obs, replace = TRUE), # categorical variable
funding_health = rnorm(n_obs, mean = 50, sd = 10), # continuous predictor
treat_condition = sample(c("A", "B"), n_obs, replace = TRUE) # treatment condition
)
logit1 <- -1 + 0.5 * d$white + 0.3 * (d$gender == "female") + 0.02 * d$funding_health
prob1 <- 1 / (1 + exp(-logit1))
d$outcome1 <- rbinom(n_obs, size = 1, prob = prob1)
logit2 <- -0.5 + 0.2 * d$white - 0.4 * (d$gender == "female") + 0.03 * d$funding_health
prob2 <- 1 / (1 + exp(-logit2))
d$outcome2 <- rbinom(n_obs, size = 1, prob = prob2)
outcomes <- list(outcome1 = "Outcome 1", outcome2 = "Outcome 2")
treat_conditions_list <- c("A", "B")
# Models
model_results <- expand.grid(outcome_var = names(outcomes), condition = treat_conditions_list) %>%
pmap_dfr(function(outcome_var, condition) {
mod <- glm(as.formula(paste(outcome_var,
"~ factor(white) + factor(gender) + funding_health")),
data = d,
family = binomial,
subset = treat_condition == condition,
na.action = na.exclude)
n <- nobs(mod) # Number of observations used in the model
tidy(mod, conf.int = TRUE) %>%
mutate(outcome = outcomes[[outcome_var]],
treat = condition,
N = n)
})
Main website for marginaleffects
documentation: https://marginaleffects.com/
Specific page on (average) slopes: https://marginaleffects.com/chapters/slopes.html
Load the package:
library(marginaleffects)
Replace this line:
tidy(mod, conf.int = TRUE) %>%
By this line:
avg_slopes(mod) %>%
And you get these results:
Term Contrast Estimate Std. Error z Pr(>|z|) S CI low CI high
funding_health dY/dX 0.008961 0.00275 3.254 0.00114 9.8 0.003564 0.01436
gender male - female -0.087811 0.05865 -1.497 0.13433 2.9 -0.202759 0.02714
white 1 - 0 0.290830 0.05852 4.970 < 0.001 20.5 0.176138 0.40552
funding_health dY/dX 0.003978 0.00288 1.382 0.16686 2.6 -0.001662 0.00962
gender male - female 0.052246 0.05848 0.893 0.37165 1.4 -0.062374 0.16687
white 1 - 0 -0.036878 0.05900 -0.625 0.53191 0.9 -0.152509 0.07875
funding_health dY/dX -0.000535 0.00305 -0.175 0.86084 0.2 -0.006516 0.00545
gender male - female -0.195410 0.06136 -3.184 0.00145 9.4 -0.315682 -0.07514
white 1 - 0 0.023226 0.06125 0.379 0.70454 0.5 -0.096824 0.14328
funding_health dY/dX 0.005933 0.00283 2.096 0.03604 4.8 0.000386 0.01148
gender male - female 0.085773 0.05744 1.493 0.13536 2.9 -0.026805 0.19835
white 1 - 0 0.006688 0.05757 0.116 0.90751 0.1 -0.106140 0.11952
Note that the results are "pretty-printed", but that you can extract columns normally, as with any other data frame:
colnames(model_results)
[1] "term" "contrast" "estimate" "std.error" "statistic" "p.value" "s.value" "conf.low" "conf.high" "outcome" "treat" "N"