Removing acronyms using regex , based on uppercase characters following parenthesis

How to remove the following:

Acronyms starting with opening bracket and followed by upper case or number: e.g. '(ABC' or '(ABC)' or '(ABC-2A)' or '(ABC-1)'.

But NOT the words that are between parenthesis starting with uppercase and followed by lowercase e.g. '(Bobby)' or '(Bob went to the beach..)' --> This is the part I am struggling with.


text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\([A-Z]*\)?', '', string)
  print(cleaned_acronyms)

#current output:
>> 'went to the beach' #Correct
>>'The girl -2A) is walking' #Not correct
>>'The dog obby) is being walked' #Not correct
>>'They are there' #Correct


#desired & correct output:
>> 'went to the beach'
>>'The girl is walking'
>>'The dog (Bobby) is being walked' #(Bobby) is NOT an acronym (uppercase+lowercase)
>>'They are there'

Solution

Have a try with a negative lookahead:

\((?![A-Z][a-z])[A-Z\d-]+\)?\s*

See an online demo

\( - A literal opening paranthesis.
(?![A-Z][a-z]) - Negative lookahead to assert position not followed by uppercase followed by lowercase.
[A-Z\d-]+ - Match 1+ uppercase alpha chars, digits or hyphens.
\)? - An optional literal closing paranthesis.
\s* - 0+ whitespace characters.

Some sample Python script:

import re
text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\((?![A-Z][a-z])[A-Z\d-]+\)?\s*', '', string)
  print(cleaned_acronyms)

Prints:

went to the beach
The girl is walking
The dog (Bobby) is being walked
They are there