Search code examples
rggplot2dplyrlinear-regression

Highlight highest residuals in a plot: R


I'm trying to learn how to highlight and annotate some points in the graph. For the purpose of a reproducible example, I'm using UBSprices dataset in alr4 package.

I'm drawing an ols line and a y=x line. I want to highlight and annotate points that are farthest from the OLS line (that is, highest residuals).

Here's my code so far:

ggplot(UBSprices, aes(x = bigmac2003, y = bigmac2009)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + 
  geom_abline(color = "green", size = 1) + coord_fixed()

Solution

  • You could calculate the residuals and then identify those with an absolute value greater than some cutoff quantile. For example:

    library(tidyverse)
    library(alr4)
    
    UBSprices %>% 
      mutate(resid = resid(lm(bigmac2009 ~ bigmac2003, data = .)),
             mark = abs(resid) >= quantile(abs(resid), prob=0.9)) %>% 
      ggplot(aes(x = bigmac2003, y = bigmac2009)) + 
      geom_point(aes(colour=mark), show.legend=FALSE) + 
      geom_smooth(method = "lm", se = FALSE) + 
      geom_abline(color = "green", size = 1) + 
      coord_fixed() +
      theme_bw() +
      scale_colour_manual(values=c("blue", "red"))
    

    enter image description here