Search code examples
rggplot2statisticsboxplotoutliers

How do I easily find boxplot outliers


Below is an example using the mtcars dataset. There is one outlier with a value of 33.9, but I want a function that finds all of them for a given column.

library(dplyr)
library(ggplot2)

mtcars %>%
  ggplot(aes(x = "", y = mpg)) +
  geom_boxplot(fill = "#2645df")

I do not know the formula for boxplot whisker limits, so I used the plot above to find that value and then changed it manually:

res = ifelse(mtcars$mpg > 33, "outlier", "not outlier")
res = ifelse(mtcars$mpg < 10, "outlier", "not outlier")

This approach is both inefficient, and incorrect: 33 is not the lower limit for outliers, neither is 10.


Solution

  • You can use boxplot.stats:

    my_outliers <- function(x, coef = 1.5) boxplot.stats(x, coef = coef)$out
    

    This is what graphics::boxplot uses. This works slightly differently from what ggplot does, which I think is equivalent to:

    my_outliers2 <- function(x, coef = 1.5) {
      x[x > quantile(x, 0.75) + IQR(x) * coef | x < quantile(x, 0.25) - IQR(x) * coef]
    }