I have a dataframe:
mydf <- data.frame(
col1 = c("54", "abc", "123", "54 abc", "zzz", "a", "99"),
col2 = c("100", "200", "300", "400", "500", "600", "700"),
stringsAsFactors = FALSE
)
In this dataframe, I want to replace all elements with NA unless they meet one of these conditions:
I was not sure how to do this in R using apply, so I tried to write a loop:
target_string <- c("a", "zzz")
replace_with_na_old <- function(df, target_string) {
for (i in 1:nrow(df)) {
for (j in 1:ncol(df)) {
value <- df[i, j]
if (!grepl("^[0-9]+$", value) && !(value %in% target_string)) {
df[i, j] <- NA
}
}
}
return(df)
}
mydf_cleaned_old <- replace_with_na_old(mydf, target_string)
Is there another way to do this?
Note: Here is how to replace %in% with %like%:
replace_with_na_new <- function(df, target_string) {
for (i in 1:nrow(df)) {
for (j in 1:ncol(df)) {
value <- df[i, j]
if (!grepl("^[0-9]+$", value) && !any(sapply(target_string, function(pattern) grepl(pattern, value)))) {
df[i, j] <- NA
}
}
}
return(df)
}
You already have the necessary logic to check this, all you need is to vectorize it.
replace_with_na <- function(value, target_string) {
value[!(grepl('^\\d+$', value) | value %in% target_string)] <- NA
value
}
Now you can apply this function for each column using any of the apply*
functions in base R.
new_df <- mydf
new_df[] <- lapply(mydf, replace_with_na, target_string)
new_df
# col1 col2
#1 54 100
#2 <NA> 200
#3 123 300
#4 <NA> 400
#5 zzz 500
#6 a 600
#7 99 700
Or if you prefer dplyr
we can use across
for similar result.
library(dplyr)
mydf %>% mutate(across(everything(), \(x) replace_with_na(x, target_string)))