Search code examples
rggplot2prediction

How to plot predictor by treatment adjusted for covariates etc.?


I want to plot the treatment effect of a fit with cubic predictors and lots of covariates and interactions adjusted for. With ggplot I can easily group the data by treatment and add a geom_smooth() to obtain this, without adjustment, though. I applied this answer to my problem, but it's a pain when you have a bunch of adjustments in your model and almost not applicable when you have data in long format. So my question is if there is an easier way to get what I want.

Some data

set.seed(42)
n <- 1e4
D <- rbinom(n, 1, .5)  # treatment indicator
X <- .5 + rnorm(n)  # bunch of covariates and other adjustemnts
P <- 5.54 + 0.35*D -.24*X + rnorm(n)  # predictor
Y <- 1.49 - 1.35*P + .5*P^2 - 0.04*P^3 - 0.83*D + 0.43*X + rnorm(n, 0, 6)
df1 <- data.frame(D, X, P, Y)

Model, complete and incomplete specified

true <- lm(Y ~ P + I(P^2) + I(P^3) + D + X , df1)  # true model
bias <- lm(Y ~ P + I(P^2) + I(P^3) + D, df1)

> round(rbind(true=coef(true), bias=c(coef(bias), NA)),
+       3)
     (Intercept)     P I(P^2) I(P^3)      D    X
true      -4.023 1.803 -0.088 -0.005 -0.728 0.42
bias      -3.426 1.753 -0.091 -0.005 -0.702   NA

So there is quite a difference of what ggplot will show me compared to the true model in the customary plot.

Plot w/o covariates

library(ggplot2)
p1 <- ggplot(df1, aes(P, Y, color=as.factor(D), group=D)) +
  geom_smooth(se=FALSE) +
  theme_bw()
p2 <- ggplot(df1, aes(P, Y, color=as.factor(D), group=D)) +
  stat_smooth(method="lm", formula=y ~ poly(x, 3, raw=TRUE), se=FALSE) +
  theme_bw()
egg::ggarrange(p1, p2)

enter image description here

Applying the mentioned solution to my problem gives me the following.

Prediction

n.data <- data.frame(D=rep(range(D), each=n/2),
                   P=rep(seq(range(df1$P)[1], range(df1$P)[2],
                              length.out=n/2), times=2),
                   X=rep(seq(range(df1$X)[1], range(df1$X)[2],  # assume this dozens of!
                             length.out=n/2), times=2))
df1.2 <- data.frame(n.data, pred=predict(true, n.data))

Plot w/ covariates

p1a <- ggplot(df1, aes(x=P, y=Y, color=as.factor(D)))  +
  geom_smooth(data=df1.2, aes(x=P, y=pred, color=as.factor(D))) +
  theme_bw()
p2a <- ggplot(df1, aes(x=P, y=Y, color=D)) +
  stat_smooth(method="lm", formula=y ~ poly(x, 3, raw=TRUE),
              data=df1.2, aes(x=P, y=pred, color=as.factor(D))) +
  theme_bw()
egg::ggarrange(p1a, p2a)

enter image description here

It looks as if it is what I want, I don't trust it very much, though. Anyway, might there be a more simple and more reliable way to get such plots?


Solution

  • I know the question is old, so just for reference!

    I would go with the effects package:

    set.seed(42)
    n <- 1e4
    D <- rbinom(n, 1, .5)  # treatment indicator
    X <- .5 + rnorm(n)  # bunch of covariates and other adjustemnts
    P <- 5.54 + 0.35*D -.24*X + rnorm(n)  # predictor
    Y <- 1.49 - 1.35*P + .5*P^2 - 0.04*P^3 - 0.83*D + 0.43*X + rnorm(n, 0, 6)
    df1 <- data.frame(D = factor(D, labels = c("Control", "Treatment")), X, P, Y)
    true <- lm(Y ~ poly(P, 3, raw = TRUE):D + X , df1)  # true model
    library(effects)
    plot(predictorEffect("P", true), lines=list(multiline=TRUE))
    

    If you want ggplot, there is the ggeffects package wich does basically the same, but with the ggplot2 system.