I have a bank of parse tree with invalid format where the words wrapped in parentheses.
string = (NP (NN (Police)) (SBAR (SC (for)) (S (NP-SBJ (*)) (VP (VB (secure)) (NP (NN (olympic games)))))))
I have tried to remove the parentheses without the word inside it turns out I remove all of them.
re.sub(r'[\(\)]','',string)
and this doesn't work too.
re.sub(r'\s\(.*\)\))
Because I think the pattern based on the second closed parentheses like
(Police)) (for)) (*)) (secure)) (olympic games))
I want to remove the parentheses which flanked the word without the word removed like this. Any help?
result = (NP (NN Police) (SBAR (SC for) (S (NP-SBJ *) (VP (VB secure) (NP (NN olympic games))))))
You may use
re.sub(r'\(([^()]*)\)', r'\1', s)
See the regex demo.
Details
\(
- a (
char([^()]*)
- Group 1 (\1
refers to this group value from the replacement pattern): 0 or more chars other than parentheses\)
- See the Python demo:
import re
s = "(NP (NN (Police)) (SBAR (SC (for)) (S (NP-SBJ (*)) (VP (VB (secure)) (NP (NN (olympic games)))))))"
print(re.sub(r'\(([^()]*)\)', r'\1', s))
# => (NP (NN Police) (SBAR (SC for) (S (NP-SBJ *) (VP (VB secure) (NP (NN olympic games))))))