Search code examples
rtime-seriesregressionconfidence-intervalr-marginaleffects

Confidence Interval for Some (But Not All) Interaction Terms in a Polynomial Interacted with a Binary Treatment


I am looking for help estimating the effect of a third order polynomial term -- over and above the linear term at different values of the independent variable x. While I can estimate the point estimate, I am not sure how to calculate the confidence interval. I am thinking that marginal effects and the marginaleffects package may be the answer here -- but this is relatively new to me am not sure how to put this together.

If you have a different package, formula, or code that seems like the right answer, I'm definitely interested in that too! The general application here is an unexpected event design / interrupted time series using social science survey data.

Here is a reproducible example. In this example, there is a running variable X ranging between about x = -2.5 and x = 2.5 and the outcome variable Y. There are two quantities of interest:

  1. The initial effect at x = 0 (i.e. the jump)
  2. The decay effect (i.e. the extra effect over and above the linear slope that occurred before the effect

In this example, the outcome variable Y is a stepwise function of X. I am using a third order polynomial to approximate this function, which as the designer of the simulation we know, even though it is incorrect.

Here is a figure (produced by the code below) that visualizes the simulated data (blue dots), the original trajectory (dashed orange line), and the third order polynomial (black line).

Figure of simulated data, original trajectory, and third order polynomial

... and here is the code

library(margins)
library(tidyverse)


set.seed(1900)

# Create Data
N <- 200
quad <- data.frame(x = rnorm(N)) %>% 
  mutate(post = as.integer(x > 0),
         x_sq = x^2,
         x_th = x^3)

# Create outcome (stepwise outcome with slight upward slope, jump, gradual decrease back to original slope)
quad$y <- with(quad, 0.002 * x + 0.05 * post - 0.05 * x * post * (x <= 1) - 0.05 * (x > 1))

# Visualize outcome
plot(quad$x, quad$y)

# Specify model (third order polynomial)

mod_quad <- lm(formula = y ~ x + x_sq + x_th + post + post * x + post * x_sq + post * x_th, data = quad)

# Plot Predictions

plot_predictions(mod_quad, by = "x", vcov = TRUE) + 
  geom_point(data = quad,
             aes(x = x, y = y),
             color = "blue",
             alpha = 0.5) +
  geom_line(data = tibble(x = seq(min(quad$x), max(quad$x), 0.01),
                          y = x * 0.002),
            aes(x = x, y = y),
            linetype = "dashed",
            color = "darkorange") +
  theme_bw()

Thank you for your help! Let me know if there are any clarifying questions.


Solution

  • You should not hardcode the polynomial terms. To compute the standard errors, marginaleffects requires numerical derivatives which are obtained by slightly tweaking the value of x. But when you hardcode the polynomial terms, marginaleffects tweaks x but only the x column is affected, and not x_sq, x_th, etc.

    Instead, use the poly() function in the model fitting call, as suggested in comments:

    library(dplyr)
    library(ggplot2)
    library(marginaleffects)
    set.seed(1900)
    
    N <- 200
    quad <- data.frame(x = rnorm(N)) %>% 
      mutate(post = as.integer(x > 0),
             x_sq = x^2,
             x_th = x^3)
    
    quad$y <- with(quad, 0.002 * x + 0.05 * post - 0.05 * x * post * (x <= 1) - 0.05 * (x > 1))
    
    mod <- lm(formula = y ~ post * poly(x, 3), data = quad)
    
    plot_predictions(mod, condition = "x")  +
      geom_point(data = quad, aes(x, y))