Remove COMPLETE row if Outlier (e.g. 99% Quantile) is Detected from EVERY Column in a Dataframe

Let's say this is my data frame.

MyData <- data.frame(
+     X = sample(10:100, 21),
+     Y = sample(10:100, 21),
+     Z = sample(10:100, 21)
+ )

I understand how to print the quantiles of each column, either with sapply or apply:

> apply( MyData , 2, quantile , .99 , na.rm = TRUE )
   X    Y    Z 
98.0 97.6 92.8 
> sapply( MyData , quantile , .99 , na.rm = TRUE )
X.99% Y.99% Z.99% 
 98.0  97.6  92.8

However, deleting the whole ROW if a value above this threshold is detected – and this for EACH column – is not working for me. Any solution – with or without dplyris appreciated.

Solution

We can use filter_all from dplyr to filter rows for a condition for every columns. all_vars means all the columns needs to meet the condition.

set.seed(123)
MyData <- data.frame(
  X = sample(10:100, 21),
  Y = sample(10:100, 21),
  Z = sample(10:100, 21)
)

head(MyData)
#   X  Y  Z
# 1 36 73 47
# 2 80 67 43
# 3 46 98 23
# 4 87 99 22
# 5 91 71 30
# 6 13 56 50

library(dplyr)

MyData2 <- MyData %>% filter_all(all_vars(. <= quantile(., 0.99, na.rm = TRUE)))

head(MyData2)
#    X  Y  Z
# 1 36 73 47
# 2 80 67 43
# 3 46 98 23
# 4 91 71 30
# 5 13 56 50
# 6 54 60 32