Below is an example using the mtcars dataset. There is one outlier with a value of 33.9, but I want a function that finds all of them for a given column.
library(dplyr)
library(ggplot2)
mtcars %>%
ggplot(aes(x = "", y = mpg)) +
geom_boxplot(fill = "#2645df")
I do not know the formula for boxplot whisker limits, so I used the plot above to find that value and then changed it manually:
res = ifelse(mtcars$mpg > 33, "outlier", "not outlier")
res = ifelse(mtcars$mpg < 10, "outlier", "not outlier")
This approach is both inefficient, and incorrect: 33 is not the lower limit for outliers, neither is 10.
You can use boxplot.stats
:
my_outliers <- function(x, coef = 1.5) boxplot.stats(x, coef = coef)$out
This is what graphics::boxplot
uses. This works slightly differently from what ggplot
does, which I think is equivalent to:
my_outliers2 <- function(x, coef = 1.5) {
x[x > quantile(x, 0.75) + IQR(x) * coef | x < quantile(x, 0.25) - IQR(x) * coef]
}