Search code examples

How can I obtain NP and VP subtrees in Stanford parser using a Spanish model

Actually I work in triplets extraction from Spanish text using Java. I need extract those triplets of the form NP-VP-NP. I'm using Stanford Parser CoreNLP v 3.7.0 and Spanish model v 3.7.0 too. My questions is next, Is there a way to extract NP subtrees and VP subtrees from a sentence in the spanish model? I realize that Spanish parser tree form is diferent from english form.


(ROOT (sentence (sn (spec (da0000 El)) (grup.nom (nc0s000 reino))) (grup.verb (vmm0000 canta) (sadv (spec (rg muy)) (grup.adv (rg bien))) (fp .)))


  • You should use the main distribution to make sure you have everything and download the Spanish models

    (available here:

    package edu.stanford.nlp.examples;
    import edu.stanford.nlp.ling.*;
    import edu.stanford.nlp.pipeline.*;
    import edu.stanford.nlp.trees.*;
    import edu.stanford.nlp.trees.tregex.*;
    import edu.stanford.nlp.util.*;
    import java.util.*;
    public class TregexExample {
      public static void main(String[] args) {
        // set up pipeline
        Properties props = StringUtils.argsToProperties("-props", "");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // Spanish example
        Annotation spanishDoc = new Annotation("...insert Spanish text...");
        // get first sentence
        CoreMap firstSentence = spanishDoc.get(CoreAnnotations.SentencesAnnotation.class).get(0);
        Tree firstSentenceTree = firstSentence.get(TreeCoreAnnotations.TreeAnnotation.class);
        // use Tregex to match
        String nounPhrasePattern = "/grup\\.nom/";
        TregexPattern nounPhraseTregexPattern = TregexPattern.compile(nounPhrasePattern);
        TregexMatcher nounPhraseTregexMatcher = nounPhraseTregexPattern.matcher(firstSentenceTree);
        while (nounPhraseTregexMatcher.find()) {