Search code examples
pythonregexstringtextpython-re

Regex in Python to remove all uppercase characters before a colon


I have a text where I would like to remove all uppercase consecutive characters up to a colon. I have only figured out how to remove all characters up to the colon itself; which results in the current output shown below.

Input Text

text = 'ABC: This is a text. CDEFG: This is a second text. HIJK: This is a third text'

Desired output:

 'This is a text. This is a second text. This is a third text'

Current code & output:

re.sub(r'^.+[:]', '', text)

#current output
'This is a third text'

Can this be done with a one-liner regex or do I need to iterate through every character.isupper() and then implement regex ?


Solution

  • You can use

    \b[A-Z]+:\s*
    
    • \b A word boundary to prevent a partial match
    • [A-Z]+: Match 1+ uppercase chars A-Z and a :
    • \s* Match optional whitespace chars

    Regex demo

    import re
    
    text = 'ABC: This is a text. CDEFG: This is a second text. HIJK: This is a third text'
    print(re.sub(r'\b[A-Z]+:\s*', '', text))
    

    Output

    This is a text. This is a second text. This is a third text