Search code examples
pythonregexextractparenthesesbrackets

Extract only specific words inside parenthesis


I want to extract only specific words inside parenthesis. For example, if I had a word list ['foo', 'bar'] and a string "alpha bravo (charlie foo bar delta) foxtrot", I want to get "alpha bravo foo bar foxtrot" by the extraction. I've already tried but failed.

word_list = ['foo', 'bar']
string = 'alpha bravo (charlie foo bar delta) foxtrot'
print(re.sub(r"\([^()]*\b({})\b[^()]*\)".format('|'.join(word_list)), r'\1', string, flags = re.I))

I expected to get "alpha bravo foo bar foxtrot" but the result was "alpha bravo bar foxtrot". Would you like to tell me how to solve this problem?


Solution

  • Here is a regex based approach using re.sub with callback logic:

    word_list = ['foo', 'bar']
    regex = r'\b(?:' + '|'.join(word_list) + r')\b'         # \b(?:foo|bar)\b
    string = 'alpha bravo (charlie foo bar delta) foxtrot'
    def repl(m):
        if m.group(1):
            return ' '.join(re.findall(regex, m.group(1)))
        else:
            return m.group(0)
    
    print(re.sub(r'\((.*?)\)|\w+', repl, string))
    

    This prints:

    alpha bravo foo bar foxtrot
    

    For an explanation, we do a global regex search on the following pattern:

    \((.*?)\)|\w+
    

    This will attempt to match, first, any terms in parentheses. If it finds such a match, it will then pass the entire match to the callback function repl(). This callback function will use re.findall on your list of words to retain only the matches you want from the parentheses. Otherwise, the above regex will just find one word at a time.