Given a list as follows:
l = ['hydrogenated benzene (purity: 99.9 density (g/cm3), produced in ZB): SD',
'Car board price (tax included): JT Port',
'Ex-factory price (low-end price): Triethanolamine (85% commercial grade): North'
]
I would like to get the expected result as follows:
['hydrogenated benzene: SD', 'Car board price: JT Port', 'Ex-factory price: Triethanolamine: North']
With code below:
def remove_extra(content):
pat1 = '[\s]' # remove space
pat2 = '\(.*\)' # remove content within parentheses
combined_pat = r'|'.join((pat2, pat3))
return re.sub(combined_pat, '', str(content))
[remove_extra(item) for item in l]
It generates:
['hydrogenated benzene : SD',
'Car board price : JT Port',
'Ex-factory price : North']
As you may notice, the last element of result 'Ex-factory price : North'
is not as expected, how could I acheive what I need? Thanks.
You can modify linked solution with \s*
for remove optionaly spaces before (
:
#https://stackoverflow.com/a/37538815/2901002
def remove_text_between_parens(text):
n = 1 # run at least once
while n:
text, n = re.subn(r'\s*\([^()]*\)', '', text) #remove non-nested/flat balanced parts
return text
a = [remove_text_between_parens(item) for item in l]
print (a)
['hydrogenated benzene: SD',
'Car board price: JT Port',
'Ex-factory price: Triethanolamine: North']