I am cleaning some string data using some stringi
functions as part of a pipe.
I would like these functions to be recursive, so that they tackle all the possible occurrences of a re, not only the first one. I cannot predict ex ante the number of times I would need to run the function to properly clean the data.
library(stringi)
test_1 <- "AAA A B BBB"
str_squish(str_remove(x, "\\b[A-Z]\\b"))
result <- "AAA B BBB"
desired <- "AAA BBB"
test_2 <- "AAA AA BBB BB CCCC"
str_replace(test_2,"(?<=\\s[A-Z]{2,3})\\s","")
result <- "AAA AABBB BB CCCC"
desired <- "AAA AABBB BBCCCC"
Maybe using gsub
, which will perform replacement of all matches:
test_1 <- "AAA A B BBB"
gsub(" +", " ", gsub("\\b[A-Z]\\b", "", test_1))
#[1] "AAA BBB"
test_2 <- "AAA AA BBB BB CCCC"
gsub("(?<=\\s[A-Z]{2})\\s", "", test_2, perl=TRUE)
#[1] "AAA AABBB BBCCCC"
For the regex (?<=\\s[A-Z]{2,3})\\s
its not clear when the condition of 2-3 should be observed and from where you are starting: E.g. stringr::str_replace_all
would give:
stringr::str_replace_all(test_2,"(?<=\\s[A-Z]{2,3})\\s","")
#[1] "AAA AABBBBBCCCC"
Also you can use a recursive function call:
f <- function(x) {
y <- stringr::str_replace(x, "(?<=\\s[A-Z]{2,3})\\s","")
if(x == y) x
else f(y)
}
f(test_2)
#[1] "AAA AABBB BBCCCC"