remove alphanumeric with 2 alphabets followed by 2 digits

a <- c("it is ZZ10ASDJN123 and ZZ100DD22")

How can i remove the words starting with first 2 alphabets followed by starting 2 digit numbers and not remove any alphanumeric more than follows 2 + digit numbers.

Expected output:

"it is and ZZ100DD22"

This code removes the numbers alone. Please help in geting me the expected output.

gsub('[[:digit:]]+', '', a)

Solution

You may use

gsub("\\s*\\b[A-Za-z]{2}\\d{2}(?!\\d)\\w*\\b", "", a, perl=TRUE)

See the regex demo. An alternative:

gsub("\\s*\\b[A-Za-z]{2}\\d{2}[A-Za-z_]\\w*\\b", "", a)

Details

\s* - 0 or more whitespace chars
\b - a word boundary
[A-Za-z]{2} - two ASCII letters (use \p{L} to match any Unicode letters)
\d{2} - two digits
(?!\d) - there can be no digit immediately to the right
\w* - 0 or more letters, digits or underscores
\b - word boundary.

Add (*UCP) at the start of the regex to make it fully Uniocde-aware.

R demo:

a <- c("it is ZZ10ASDJN123 and ZZ100DD22")
gsub("\\s*\\b[A-Za-z]{2}\\d{2}(?!\\d)\\w*", "", a, perl=TRUE)
## => [1] "it is and ZZ100DD22"