I have a requirement to replace the following values to corresponding vector elements in the data frame.
Column a : c("abc*^!", "abcde+", "abcde123********++++++", "Post TCZ 6 hours (+-3hrs)")
Column b : c('xx', 'yy', 'zz', 'aa')
If I directly do it using stringr::str_replace_all function then it does not work as the * and + symbols present in my column a is being treated as regex patterns. Inorder to achieve this I created a function that escapes the special characters to make sure that the pattern matching works as expected and so does the string replacement. Looks like str_replace does not like to read the patterns from a column in the data frame. Is there a way to achieve this?
Note: I am using this method in continuation to the existing code as this is present in git and used by many other teams.
Here is the evidence that regex pattern (column VISIT_INSTACE1) created by the escape_spl_chars function is creating the correct matching patterns for column a. Please let me know if someone can throw some light on this.
x <- data.frame(a = c("abc*^!", "abcde+", "abcde123********++++++", "Post TCZ 6 hours (+-3hrs)"),
b = c('xx', 'yy', 'zz', 'aa'), stringsAsFactors = F)
escape_spl_chars <- function(arg1){
return_x <- sapply(strsplit(arg1, "", fixed = TRUE), function(y) {
pasted_chars <- sapply(y, function(char) {
# Convert character to ASCII code
ascii_code <- as.integer(charToRaw(char))
# Check if it's a special character and escape it
if ((ascii_code >= 33 & ascii_code <= 47) |
(ascii_code >= 58 & ascii_code <= 64) |
(ascii_code >= 91 & ascii_code <= 96) |
(ascii_code >= 123 & ascii_code <= 126)) {
return(paste0("\\\\", char)) # Escape special character
} else if (ascii_code == 32){
return(paste0("\\\\", 's')) # Escape space character
} else {
return(char) # Return normal character
}
})
# Collapse the characters back into a single string
paste0(pasted_chars, collapse = "")
})
return(return_x)
}
x1 <- x %>% mutate(VISIT_INSTACE1 = escape_spl_chars(a))
x1 <- x1 %>% dplyr::mutate(newvisitcode = stringr::str_replace_all(a, stringr::str_trim(VISIT_INSTACE1), b))
Use fixed
to compare literal characters.
library(stringr)
library(dplyr)
x %>% mutate(newvisitcode=str_replace_all(string=a, pattern=fixed(a), replacement=b))
a b newvisitcode
1 abc*^! xx xx
2 abcde+ yy yy
3 abcde123********++++++ zz zz
4 Post TCZ 6 hours (+-3hrs) aa aa