Search code examples
rggplot2tidyversepurrr

How to plot leave-one-out regression lines with ggplot / tidyverse


I'd like to use ggplot to plot a regression line and also show the regression lines resulting from omitting each of the observations in turn.

Here is a basic plot, using the anscombe dataset:

data(anscombe)

ggplot(anscombe, aes(x = x2, y = y2)) +
  geom_point(color = col, size = 4) +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE,
              color = "red", linewidth = 1.5) +
  labs(title = "Dataset 1" ) +
  theme_bw(base_size = 14)

I can do what I want manually, but it is very clunky. There is probably a tidyverse solution, most likely using purrr, but I can't think of it.

ggplot(anscombe, aes(x = x2, y = y2)) +
  geom_point(color = col, size = 4) +
  geom_smooth(data = anscombe |> slice(-1), method = "lm", formula = y ~ x, se = FALSE, color = "grey") +
  geom_smooth(data = anscombe |> slice(-2), method = "lm", formula = y ~ x, se = FALSE, color = "grey") +
# ...
  geom_smooth(data = anscombe |> slice(-10), method = "lm", formula = y ~ x, se = FALSE, color = "grey") +
  geom_smooth(data = anscombe |> slice(-11), method = "lm", formula = y ~ x, se = FALSE, color = "grey") +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE,
              color = "red", linewidth = 1.5) +
  labs(title = "Dataset 2" ) +
  theme_bw(base_size = 14)

enter image description here


Solution

  • Using e.g. lapply you could loop over the rows of your data like so:

    library(ggplot2)
    
    ggplot(anscombe, aes(x = x2, y = y2)) +
      geom_point(color = "blue", size = 4) +
      lapply(seq(nrow(anscombe)), function(x) {
        geom_smooth(
          data = anscombe[-x, ],
          method = "lm", formula = y ~ x, se = FALSE,
          color = "grey", linewidth = 1.5
        )
      }) +
      geom_smooth(
        method = "lm", formula = y ~ x, se = FALSE,
        color = "red", linewidth = 1.5
      ) +
      labs(title = "Dataset 1") +
      theme_bw(base_size = 14)