a <- c("it is ZZ10ASDJN123 and ZZ100DD22")
How can i remove the words starting with first 2 alphabets followed by starting 2 digit numbers and not remove any alphanumeric more than follows 2 + digit numbers.
Expected output:
"it is and ZZ100DD22"
This code removes the numbers alone. Please help in geting me the expected output.
gsub('[[:digit:]]+', '', a)
You may use
gsub("\\s*\\b[A-Za-z]{2}\\d{2}(?!\\d)\\w*\\b", "", a, perl=TRUE)
See the regex demo. An alternative:
gsub("\\s*\\b[A-Za-z]{2}\\d{2}[A-Za-z_]\\w*\\b", "", a)
Details
\s*
- 0 or more whitespace chars\b
- a word boundary[A-Za-z]{2}
- two ASCII letters (use \p{L}
to match any Unicode letters)\d{2}
- two digits(?!\d)
- there can be no digit immediately to the right\w*
- 0 or more letters, digits or underscores\b
- word boundary.Add (*UCP)
at the start of the regex to make it fully Uniocde-aware.
a <- c("it is ZZ10ASDJN123 and ZZ100DD22")
gsub("\\s*\\b[A-Za-z]{2}\\d{2}(?!\\d)\\w*", "", a, perl=TRUE)
## => [1] "it is and ZZ100DD22"