I wanted to generate some predicted probabilities and I was recommended two commands: ggeffects::ggpredict()
and margin effects::predictions()
. However, some of the results seem similar (the conf.high) and other results seem different (the conf.low). What is going on here? Any insights would be helpful.
Here's an example:
logit <- glm(vs ~ mpg + cyl + disp + hp + drat + wt + qsec + am + gear + carb, mtcars, family = "poisson")
library(ggeffects)
ggpredict(logit, terms=c("mpg")) %>% as_tibble()
library(marginaleffects)
predictions(logit, by=c("mpg")) %>% as_tibble()
The two functions do different things by default.
ggpredict()
with the terms
argument computes predictions for each row of a data frame where mpg
takes on values between 10 and 32, and all other variables are held at their means, including am
, which is set to 0.41 even if it is presumably binary/categorical.predictions()
with the by
argument computes predictions for each row in the actually observed data frame, and then takes the average of these predictions for each unique value of mpg
.In other words, ggpredict()
reports predictions for “synthetic” data — hypothetical units which are exactly average along all dimensions except one. predictions()
reports averages of predictions made on actually observed data.
It is easy to reproduce the same results as ggpredict()
with predictions()
, by using the newdata
argument and the datagrid()
function:
library(marginaleffects)
library(ggeffects)
logit <- glm(vs ~ mpg + cyl + disp + hp + drat + wt + qsec + am + gear + carb, mtcars, family = "poisson")
ggpredict(logit,
terms = c("mpg")) |>
data.frame() |> head()
# x predicted std.error conf.low conf.high group
# 1 10 0.2243430 1.2888961 0.01793916 2.8055810 1
# 2 12 0.2018065 1.0826924 0.02417384 1.6847077 1
# 3 14 0.1815340 0.9117794 0.03039834 1.0840915 1
# 4 16 0.1632979 0.7991309 0.03410037 0.7819915 1
# 5 18 0.1468937 0.7707321 0.03243060 0.6653523 1
# 6 20 0.1321375 0.8352217 0.02570893 0.6791536 1
predictions(logit,
newdata = datagrid(mpg = seq(10, 34, by = 2), am = mean)) |>
head()
#
# mpg am Estimate Pr(>|z|) S 2.5 % 97.5 % cyl disp hp drat wt qsec
# 10 0.406 0.224 0.2462 2.0 0.0179 2.806 6.19 231 147 3.6 3.22 17.8
# 12 0.406 0.202 0.1394 2.8 0.0242 1.685 6.19 231 147 3.6 3.22 17.8
# 14 0.406 0.182 0.0613 4.0 0.0304 1.084 6.19 231 147 3.6 3.22 17.8
# 16 0.406 0.163 0.0233 5.4 0.0341 0.782 6.19 231 147 3.6 3.22 17.8
# 18 0.406 0.147 0.0128 6.3 0.0324 0.665 6.19 231 147 3.6 3.22 17.8
# 20 0.406 0.132 0.0154 6.0 0.0257 0.679 6.19 231 147 3.6 3.22 17.8
#
# Columns: rowid, estimate, p.value, s.value, conf.low, conf.high, vs, cyl, disp, hp, drat, wt, qsec, gear, carb, mpg, am
# Type: invlink(link)
PS: this is not a logit model.