Search code examples
rggplot2linear-regression

Drawing a regression line with interaction in ggplot2


I'm trying to plot the results of a linear regression in ggplot2. I'm interested in the effect of a factorial "treatment" variable on my measurements over time. I want to calculate one slope for the time variable for each level of the "treatment" variable, but one common intercept. The function call for my linear model is:

lm(measurement ~ time:treatment, data = example_data)

However, I struggle with implementing this specific formula in the geom_smooth() function in ggplot2

This is what I have tried so far:

library(tidyverse)

#Generate an example dataset for reproduciblity:
set.seed(42)
example_data = 
data.frame(treatment   = as.factor(c(rep("t1", 50), rep("t2", 50))),
           time        = rep(1:50, 2), 
           error       = rnorm(100, 0, 1))                      %>% 
mutate(    slope       = ifelse(treatment == "t1", -0.2, -0.3)) %>% 
mutate(    measurement = 2.5 + time * slope + error)          
                  
#Linear model that I want to display in my plot: 
model = lm(measurement ~ time:treatment, data = example_data)
summary(model)

#My attempt to plot the regression line:
ggplot(data = example_data, aes(x = time, y = measurement)) + 
  geom_point()  + 
  geom_smooth(method  = "lm", 
              formula = y ~ x:treatment) +
  facet_wrap(~treatment)

However, this does not plot any lines and returns the warning:"object treatment not found". It seems like the only variables which can be used in the "formula" argument are x and y.

Regression lines are plotted when I use "formula = y~x". However, this also generates different intercepts for each treatment.

I would really appreciate if anybody could help me.


Solution

  • This can be done pretty easily with the ggpredict() function from the ggeffects package. You just give it the model and tell it to get predictions for time and treatment. Anything else in the model is held constant at a central value.

    library(tidyverse)
    library(ggeffects)
    
    #Generate an example dataset for reproduciblity:
    set.seed(42)
    example_data = 
      data.frame(treatment   = as.factor(c(rep("t1", 50), rep("t2", 50))),
                 time        = rep(1:50, 2), 
                 error       = rnorm(100, 0, 1))                      %>% 
      mutate(    slope       = ifelse(treatment == "t1", -0.2, -0.3)) %>% 
      mutate(    measurement = 2.5 + time * slope + error)          
    
    #Linear model that I want to display in my plot: 
    model = lm(measurement ~ time:treatment, data = example_data)
    summary(model)
    #> 
    #> Call:
    #> lm(formula = measurement ~ time:treatment, data = example_data)
    #> 
    #> Residuals:
    #>     Min      1Q  Median      3Q     Max 
    #> -3.2371 -0.5377  0.0119  0.7003  2.1431 
    #> 
    #> Coefficients:
    #>                   Estimate Std. Error t value Pr(>|t|)    
    #> (Intercept)       2.814833   0.210159   13.39   <2e-16 ***
    #> time:treatmentt1 -0.214277   0.007995  -26.80   <2e-16 ***
    #> time:treatmentt2 -0.307866   0.007995  -38.51   <2e-16 ***
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> Residual standard error: 1.035 on 97 degrees of freedom
    #> Multiple R-squared:  0.9393, Adjusted R-squared:  0.938 
    #> F-statistic: 750.2 on 2 and 97 DF,  p-value: < 2.2e-16
    

    You can generate predictions using ggpredict() which produces a modified tibble. You can call plot() on this object and it makes a plot with superposed lines.

    g <- ggpredict(model, terms=c("time", "treatment")) 
    
    plot(g)
    

    Since the result of plot(g) is a ggplot, you can modify it in the expected ways (e.g., adding a facet).

    plot(g) + facet_wrap(~group)
    

    Created on 2024-02-05 with reprex v2.0.2