Search code examples
rdplyroutliers

Tidyverse filter outliers - in one pipe


I want to filter outliers in the tidyverseframe work in one pipe. Outlier for this example is just defined as Q1 - 1.5 * IQR and Q3 + 1.5 * IQR. Q1 being the 25 percentile and Q3 the 75% percentile. And IQR the interquartile range, IQR = Q3 - Q1.

I managed to compute the upper and lower bound for outliers, and I am familiar with the filter() function from dplyr. However I do not know how to get the values calculated inside the summarize in the same pipe operation back to the complete data.frame

iris %>% 
  group_by(Species) %>% 
  # filter(API_Psy_dm <=)
  summarise(IQR = IQR(Sepal.Length),
            O_upper =quantile(Sepal.Length, probs=c( .75), na.rm = FALSE)+1.5*IQR,  
            O_lower =quantile(Sepal.Length, probs=c( .25), na.rm = FALSE)-1.5*IQR  
  )

Is this even possible? Or would I need a second pipe? Or is there a more convenient way than to calculate the upper and lower limit myself?


Solution

  • Use mutate instead of summarize, and then filter:

    iris %>% 
      group_by(Species) %>% 
      mutate(IQR = IQR(Sepal.Length),
                O_upper = quantile(Sepal.Length, probs=c( .75), na.rm = FALSE)+1.5*IQR,  
                O_lower = quantile(Sepal.Length, probs=c( .25), na.rm = FALSE)-1.5*IQR  
      ) %>% 
      filter(O_lower <= Sepal.Length & Sepal.Length <= O_upper)