I have a dataset blah
with a column kw
. There are tens of thousands of strings, some of which are sentence-length. I already replaced the vast majority of what I want to replace with a for
loop, replacing substrings with substring categories. However, I cannot possibly think of all the substrings that need replacing--while most of the heavy lifting is done, there are just a good amount of edge cases and I want to handle them as they arise.
I want to create a function cleanup
where I can pass it an oldsubstring and a newsubstring and the function will replace instance of oldsubstring in blah$kw
with newsubstring.
Here's what I've written so far:
cleanup <- function(oldstring,
newstring) {
blah$kw[grepl(oldstring,
blah$kw)] <- sapply(blah$kw[grepl(oldstring,
blah$kw)],
function(x) gsub(oldstring,
newstring,
x))
}
This may look stupid, I have no idea--I'm quite new to R. But I am basing it off of the one-off code I found, which is here:
blah$kw[grepl(oldstring,
blah$kw)] <- sapply(blah$kw[grepl("oldstring",
blah$kw)],
function(x) gsub("oldstring",
"newstring",
x))
}
And which works just like a charm. Anyway, any help would be huge. Thanks!
It's typically best practice not to hardcode the data set to the function and pass it as a variable. What you're looking for can be accomplished via subsetting
cleanup <- function(df1, oldstring, newstring) {
df1[grepl(oldstring, df1)] <- gsub(oldstring, newstring, df1[grepl(oldstring, df1)])
df1
}
blah$bw <- cleanup(blah$bw, "a", "y")
Note: this will not work if your strings are stored as factors