I would like to replace values with "NA" in a data frame but only if a value in a specified column is out of a defined range.
Here's an example: Assuming I have 5 columns in a data frame (called a, b, c, d, e). I want to check if column "a" is out of a certain range (for example a < 2 or a > 5) and if that is true, I would like to assign "NA" to the values in columns a, b and c but values in colums d and e should stay the same.
a <- c(1, 3, 6, 1, 4)
b <- c(4, 5, 7, 5, 3)
c <- c(1, 2, 3, 5, 2)
d <- c(3, 3, 3, 5, 6)
e <- c(2, 2, 4, 2, 1)
data <- data.frame(cbind(a,b,c,d,e))
So the desired output would be:
a b c d e
NA NA NA 3 2
3 5 2 3 2
NA NA NA 3 4
NA NA NA 5 2
4 3 2 6 1
Here's what I've tried:
variables <- c("a", "b", "c")
new_data <- data %>%
mutate(across(variables), if_else(a < 2 | a > 5, NA_character_, ""))
Another idea was to put it in a for-Loop:
for (x in variables) {
new_data <- data %>%
mutate(across(all_of(variables)), if_else(a < 2 | a > 5, NA_character_, x))
}
But these solutions only add a column and do not change the values accordingly.
This is a simplified example. I would like to apply the solution to a greater number of variables. Any help is appreciated!
You were almost there, but you need to take care how you provide your function. You can use tidyverse formula interface like this (note that .x
will refer to the column you are currently mutating):
library(dplyr)
data %>%
mutate(across(c(a, b, c), ~ if_else(!between(a, 2, 5), NA_real_, .x)))
# a b c d e
# 1 NA NA NA 3 2
# 2 3 5 2 3 2
# 3 NA NA NA 3 4
# 4 NA NA NA 5 2
# 5 4 3 2 6 1