Let's say this is my data frame.
MyData <- data.frame(
+ X = sample(10:100, 21),
+ Y = sample(10:100, 21),
+ Z = sample(10:100, 21)
+ )
I understand how to print the quantiles of each column, either with sapply
or apply
:
> apply( MyData , 2, quantile , .99 , na.rm = TRUE )
X Y Z
98.0 97.6 92.8
> sapply( MyData , quantile , .99 , na.rm = TRUE )
X.99% Y.99% Z.99%
98.0 97.6 92.8
However, deleting the whole ROW if a value above this threshold is detected – and this for EACH column – is not working for me. Any solution – with or without dplyr
is appreciated.
We can use filter_all
from dplyr
to filter rows for a condition for every columns. all_vars
means all the columns needs to meet the condition.
set.seed(123)
MyData <- data.frame(
X = sample(10:100, 21),
Y = sample(10:100, 21),
Z = sample(10:100, 21)
)
head(MyData)
# X Y Z
# 1 36 73 47
# 2 80 67 43
# 3 46 98 23
# 4 87 99 22
# 5 91 71 30
# 6 13 56 50
library(dplyr)
MyData2 <- MyData %>% filter_all(all_vars(. <= quantile(., 0.99, na.rm = TRUE)))
head(MyData2)
# X Y Z
# 1 36 73 47
# 2 80 67 43
# 3 46 98 23
# 4 91 71 30
# 5 13 56 50
# 6 54 60 32