I have a character vector
words <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
And I'm trying to remove span
AND punctuation from every word in the vector
> something thank great to hear your
The thing is, there's no rule if span
will appear before or after the word I'm interested in. Also, span
can be glued to: i) characters only (e.g. yourspan
), punctuation only (e.g. ..span?
) or character and punctuation (e.g. somethingspan.
).
I searched SO for the answer, but usually I see request to remove whole words (like here ) or elements of the string after/before a letter/punctuation (like here )
Any help will be appreciated
You may use
[[:punct:]]*span[[:punct:]]*
See the regex demo.
Details
[[:punct:]]*
- 0+ punctuations charsspan
- a literal substring[[:punct:]]*
- 0+ punctuations charswords <- c("somethingspan.", "..span?", "spanthank", "great to hear", "yourspan")
words <- gsub("[[:punct:]]*span[[:punct:]]*", "", words) # Remove spans
words <- words[words != ""] # Discard empty elements
paste(words, collapse=" ") # Concat the elements
## => [1] "something thank great to hear your"
If there result whitespace only elements after removing unwanted strings, you may replace the second step with words <- words[trimws(words) != ""]
(instead of words[words != ""]
).