Search code examples
nlpstanford-nlp

Using Tsurgeon for recursive stripping of phrases


I would like to "simplify" phrases by recursively stripping modifiers. For example, given a tree with two embedded PPs

(NP (NP (JJ Microbial) (NN expression)) (PP (IN in) (NP (NP (DT the) (NN rhizosphere)) (PP (IN of) (NP (NNS willows))))))

I want to derive first

(NP (NP (JJ Microbial) (NN expression)) (PP (IN in) (NP (NP (DT the) (NN rhizosphere)))))

and second

(NP (NP (JJ Microbial) (NN expression)))

However, a script like this

PP=pp !<< PP

delete PP

will delete both PPs right away, because after the first match, Tsurgeon will immediately apply the same pattern again.

Is there a way to force Tsurgeon to apply the operation only once or is there some other trick to accomplish this?


Solution

  • Try it in two steps:

    1. Mark the required node for deletion.

    Like this:

    PP=pp !<< PP !<< DELETE_ME
    relabel pp DELETE_ME
    
    1. Delete all marked nodes.

    Simply:

    DELETE_ME=pp
    delete pp