Search code examples
pythonregexparenthesesparse-tree

Remove only parentheses in nested parentheses


I have a bank of parse tree with invalid format where the words wrapped in parentheses.

string = (NP  (NN  (Police)) (SBAR  (SC (for)) (S  (NP-SBJ  (*)) (VP  (VB (secure)) (NP  (NN      (olympic games)))))))

I have tried to remove the parentheses without the word inside it turns out I remove all of them.

re.sub(r'[\(\)]','',string)

and this doesn't work too.

re.sub(r'\s\(.*\)\))

Because I think the pattern based on the second closed parentheses like

(Police)) (for)) (*)) (secure)) (olympic games))

I want to remove the parentheses which flanked the word without the word removed like this. Any help?

result = (NP  (NN Police) (SBAR  (SC for) (S  (NP-SBJ  *) (VP  (VB secure) (NP  (NN  olympic games))))))

Solution

  • You may use

    re.sub(r'\(([^()]*)\)', r'\1', s)
    

    See the regex demo.

    Details

    • \( - a ( char
    • ([^()]*) - Group 1 (\1 refers to this group value from the replacement pattern): 0 or more chars other than parentheses
    • \) -

    See the Python demo:

    import re
    s = "(NP  (NN  (Police)) (SBAR  (SC (for)) (S  (NP-SBJ  (*)) (VP  (VB (secure)) (NP  (NN      (olympic games)))))))"
    print(re.sub(r'\(([^()]*)\)', r'\1', s))
    # => (NP  (NN  Police) (SBAR  (SC for) (S  (NP-SBJ  *) (VP  (VB secure) (NP  (NN      olympic games))))))