Search code examples
pythonregexreplacematchsymbols

Replace symbol before match using regex in Python


I have strings such as:

text1 = ('SOME STRING,99,1234 FIRST STREET,9998887777,ABC')
text2 = ('SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF')
text3 = ('ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI')

Desired output:

SOME STRING 99,1234 FIRST STREET,9998887777,ABC
SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF
ANOTHER STRING #88,4321 THIRD STREET,3332221111,GHI

My idea: Use regex to find occurrences of 1-5 digits, possibly preceded by a symbol, that are between two commas and not followed by a space and letters, then replace by this match without the preceding comma. Something like:

text.replace(r'(,\d{0,5},)','.........')

Solution

  • If you would use regex module instead of re then possibly:

    import regex
    str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
    print(regex.sub(r'(?<!^.*,.*),(?=#?\d+,\d+)', ' ', str))
    

    You might be able to use re if you sure there are no other substring following the pattern in the lookahead.

    import re
    str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
    print(re.sub(r',(?=#?\d+,\d+)', ' ', str))