I have strings such as:
text1 = ('SOME STRING,99,1234 FIRST STREET,9998887777,ABC')
text2 = ('SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF')
text3 = ('ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI')
Desired output:
SOME STRING 99,1234 FIRST STREET,9998887777,ABC
SOME OTHER STRING,56789 SECOND STREET,6665554444,DEF
ANOTHER STRING #88,4321 THIRD STREET,3332221111,GHI
My idea: Use regex to find occurrences of 1-5 digits, possibly preceded by a symbol, that are between two commas and not followed by a space and letters, then replace by this match without the preceding comma. Something like:
text.replace(r'(,\d{0,5},)','.........')
If you would use regex
module instead of re
then possibly:
import regex
str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
print(regex.sub(r'(?<!^.*,.*),(?=#?\d+,\d+)', ' ', str))
You might be able to use re
if you sure there are no other substring following the pattern in the lookahead.
import re
str = "ANOTHER STRING,#88,4321 THIRD STREET,3332221111,GHI"
print(re.sub(r',(?=#?\d+,\d+)', ' ', str))