I have found this function and I would like to adapt it to replace outliers with NA instead of removing the observation.
I have tried to add <-NA
in this line data <- data[!outliers(data[[col]]),]
but I cannot make it work. Could you help me to adapt it, please?
Here you can find the code with some simulated data. Please let me know if you need something else.
Thank you so much in advance.
cov.matone <- matrix(c(1, .0,
.0, 1), nrow = 2)
data <- data.frame(MASS::mvrnorm(n = 1e4,
mu = c(4, 4),
Sigma = cov.matone))
outliers <- function(x) {
Q1 <- quantile(x, probs=.25, na.rm=T)
Q3 <- quantile(x, probs=.75, na.rm=T)
iqr = Q3-Q1
upper_limit = Q3 + (iqr*1.5)
lower_limit = Q1 - (iqr*1.5)
x > upper_limit | x < lower_limit
}
remove_outliers <- function(data, cols = names(data)) {
for (col in cols) {
data <- data[!outliers(data[[col]]),]
}
data
}
data_nooutliers <- remove_outliers(data, c('X1', 'X2' ))
Instead of assigning the loop results to the input data, use is.na<-
to assign NA
values to elements given by function outliers
.
remove_outliers <- function(data, cols = names(data)) {
for (col in cols) {
is.na(data[[col]]) <- outliers(data[[col]])
}
data
}
The following function does exactly the same as function outliers
but is a much simpler one-liner.
outliers2 <- function(x) x %in% boxplot.stats(x)$out
s1 <- lapply(names(data), \(x) outliers(data[[x]]))
s2 <- lapply(names(data), \(x) outliers2(data[[x]]))
identical(s1, s2)
#[1] TRUE