Search code examples
python-3.xpandaspython-re

Remove content with parentheses under multiple conditions in Python


Given a list as follows:

l = ['hydrogenated benzene (purity: 99.9 density (g/cm3), produced in ZB): SD', 
    'Car board price (tax included): JT Port', 
    'Ex-factory price (low-end price): Triethanolamine (85% commercial grade): North'
    ]

I would like to get the expected result as follows:

['hydrogenated benzene: SD', 'Car board price: JT Port', 'Ex-factory price: Triethanolamine: North']

With code below:

def remove_extra(content):
    pat1 = '[\s]'  # remove space
    pat2 = '\(.*\)' # remove content within parentheses
    combined_pat = r'|'.join((pat2, pat3))
    return re.sub(combined_pat, '', str(content))
[remove_extra(item) for item in l]

It generates:

['hydrogenated benzene : SD',
 'Car board price : JT Port',
 'Ex-factory price : North']

As you may notice, the last element of result 'Ex-factory price : North' is not as expected, how could I acheive what I need? Thanks.


Solution

  • You can modify linked solution with \s* for remove optionaly spaces before (:

    #https://stackoverflow.com/a/37538815/2901002 
    def remove_text_between_parens(text):
        n = 1  # run at least once
        while n:
            text, n = re.subn(r'\s*\([^()]*\)', '', text) #remove non-nested/flat balanced parts
        return text
    
    a = [remove_text_between_parens(item) for item in l]
    print (a)
    
    ['hydrogenated benzene: SD', 
     'Car board price: JT Port', 
     'Ex-factory price: Triethanolamine: North']