Search code examples
rdplyrgroup-byoutliers

How to remove outliers in only one column after grouping by another column in R


I want to remove outliers from a variable MEASURE after grouping by TYPE. I tried the following code but it didn't work. I've searched and I've only came across how to remove outliers for the whole dataframe or one column. But not by after grouping.

df2 <- df %>%
  group_by(TYPE) %>%
  mutate(MEASURE_WITHOUT_OUTLIERS = remove_outliers(MEASURE))

Solution

  • You can use boxplot.stats to get outlier values in each group and use filter to remove them.

    library(dplyr)
    
    df2 <- df %>%
      group_by(TYPE) %>%
      filter(!MEASURE %in% boxplot.stats(MEASURE)$out) %>%
      ungroup