Search code examples
rstatic-methodsadjustment

Add age adjustment to geom_smooth


I need to include age adjustment in the geom_smooth line I am adding to my ggscatter plot.

my data looks like~ table link

structure(list(Time = c(0L, 0L, 0L, 0L, 6L, 12L, 18L, 18L, 0L, 
12L, 18L, 6L), group = structure(c(1L, 1L, 2L, 2L, 1L, 3L, 3L, 
3L, 3L, 4L, 4L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"), 
    Age = c(77, 70.2, 69.9, 65.7, 66.2, 66.7, 67.2, 67.7, 66.8, 
    67.8, 68.3, 68.8), Average = c(96L, 90L, 94L, 94L, 96L, 96L, 
    92L, 120L, 114L, 109L, 113L, 103L)), row.names = c(NA, 12L
), class = "data.frame")

What I currently have (the 'Average" value have dependency in age..):

ggscatter(dtable, "Time","Average",conf.int = TRUE)+theme_bw()+
geom_smooth(aes(group=1),method='lm')+facet_wrap(~groups)

What I would like to have is something like:

ggscatter(dtable, "Time","Average",conf.int = TRUE)+theme_bw()+
geom_smooth(aes(group=1),method='lm', adjust= ~age)+facet_wrap(~groups)

With adjustment per each group mean age

Any suggestions?


Solution

  • Here is I think what you are after.

    First, we need to fit the more complicated model because ggplot does not have a functionality for multivariable models (yet)

    fit <- lm(Average ~ Time + group + Age, data = tdata)
    

    Then we can use some functionality from the broom package to add the predictions and associated standard errors. With these in hand we can manually build the plot using the geom_line and geom_ribbon geoms

    library(broom)
    tdata %>% 
      bind_cols(augment(fit)) %>% 
      ggplot(aes(Time, Average))+
      geom_point()+
      geom_line(aes(x = Time, y = .fitted), size = 2, color = "blue")+
      geom_ribbon(aes(ymin = .fitted + .se.fit*2, ymax = .fitted - .se.fit*2), alpha = .2)+
      facet_wrap(~group)+
      theme_bw()
    

    Additionally, if you wanted to look at pooled vs non-pooled estimates

    fit_no_pool <- lm(Average ~ Time + group + Age, data = tdata)
    fit_complete_pool <- lm(Average ~ Time + Age, data = tdata)
    
    library(broom)
    tdata %>% 
      bind_cols(augment(fit_no_pool) %>% setNames(sprintf("no_pool%s", names(.)))) %>% 
      bind_cols(augment(fit_complete_pool) %>% setNames(sprintf("pool%s", names(.)))) %>% 
      ggplot(aes(Time, Average))+
      geom_point()+
      # Non-Pooled Estimates
      geom_line(aes(x = Time, y = no_pool.fitted, color = "blue"), size = 2)+
      geom_ribbon(aes(ymin = no_pool.fitted + no_pool.se.fit*2, 
                      ymax = no_pool.fitted - no_pool.se.fit*2), alpha = .2)+
      # Pooled Estimates
      geom_line(aes(x = Time, y = pool.fitted, color = "orange"), size = 2)+
      geom_ribbon(aes(ymin = pool.fitted + pool.se.fit*2, 
                      ymax = pool.fitted - pool.se.fit*2), alpha = .2)+
      facet_wrap(~group)+
      scale_color_manual(name = "Regression", 
                           labels = c("Pooled", "Non-Pooled"), 
                         values = c("blue", "orange"))+
      theme_bw()