Search code examples
javastanford-nlp

How can I obtain NP and VP subtrees in Stanford parser using a Spanish model


Actually I work in triplets extraction from Spanish text using Java. I need extract those triplets of the form NP-VP-NP. I'm using Stanford Parser CoreNLP v 3.7.0 and Spanish model v 3.7.0 too. My questions is next, Is there a way to extract NP subtrees and VP subtrees from a sentence in the spanish model? I realize that Spanish parser tree form is diferent from english form.

Ex:

(ROOT (sentence (sn (spec (da0000 El)) (grup.nom (nc0s000 reino))) (grup.verb (vmm0000 canta) (sadv (spec (rg muy)) (grup.adv (rg bien))) (fp .)))


Solution

  • You should use the main distribution to make sure you have everything and download the Spanish models

    (available here: http://stanfordnlp.github.io/CoreNLP/download.html)

    package edu.stanford.nlp.examples;
    
    import edu.stanford.nlp.ling.*;
    import edu.stanford.nlp.pipeline.*;
    import edu.stanford.nlp.trees.*;
    import edu.stanford.nlp.trees.tregex.*;
    import edu.stanford.nlp.util.*;
    
    import java.util.*;
    
    
    public class TregexExample {
    
      public static void main(String[] args) {
        // set up pipeline
        Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-spanish.properties");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // Spanish example
        Annotation spanishDoc = new Annotation("...insert Spanish text...");
        pipeline.annotate(spanishDoc);
        // get first sentence
        CoreMap firstSentence = spanishDoc.get(CoreAnnotations.SentencesAnnotation.class).get(0);
        Tree firstSentenceTree = firstSentence.get(TreeCoreAnnotations.TreeAnnotation.class);
        // use Tregex to match
        String nounPhrasePattern = "/grup\\.nom/";
        TregexPattern nounPhraseTregexPattern = TregexPattern.compile(nounPhrasePattern);
        TregexMatcher nounPhraseTregexMatcher = nounPhraseTregexPattern.matcher(firstSentenceTree);
        while (nounPhraseTregexMatcher.find()) {
          nounPhraseTregexMatcher.getMatch().pennPrint();
        }
      }
    }