Replace outliers with NA

I have found this function and I would like to adapt it to replace outliers with NA instead of removing the observation.

I have tried to add <-NA in this line data <- data[!outliers(data[[col]]),] but I cannot make it work. Could you help me to adapt it, please?

Here you can find the code with some simulated data. Please let me know if you need something else.

Thank you so much in advance.

cov.matone <- matrix(c(1, .0,
                       .0, 1), nrow = 2)

data <- data.frame(MASS::mvrnorm(n = 1e4, 
                                  mu = c(4, 4), 
                                  Sigma = cov.matone))

outliers <- function(x) {
  
  Q1 <- quantile(x, probs=.25, na.rm=T)
  Q3 <- quantile(x, probs=.75, na.rm=T)
  iqr = Q3-Q1
  
  upper_limit = Q3 + (iqr*1.5)
  lower_limit = Q1 - (iqr*1.5)
  
  x > upper_limit | x < lower_limit
}

remove_outliers <- function(data, cols = names(data)) {
  for (col in cols) {
    data <- data[!outliers(data[[col]]),]
  }
  data
}

data_nooutliers <- remove_outliers(data, c('X1', 'X2' ))

Solution

Instead of assigning the loop results to the input data, use is.na<- to assign NA values to elements given by function outliers.

remove_outliers <- function(data, cols = names(data)) {
  for (col in cols) {
    is.na(data[[col]]) <- outliers(data[[col]])
  }
  data
}

Note

The following function does exactly the same as function outliers but is a much simpler one-liner.

outliers2 <- function(x) x %in% boxplot.stats(x)$out

s1 <- lapply(names(data), \(x) outliers(data[[x]]))
s2 <- lapply(names(data), \(x) outliers2(data[[x]]))
identical(s1, s2)
#[1] TRUE