Search code examples
stanford-nlp

Stanford coreNLP : how to get Label, position, and typed dependecies from parse Tree


I am using Stanford coreNLP to parse some text. I get multiple sentences. On these sentences I managed to extract Noun Phrases using TregexPattern. So I get a child Tree that is my Noun Phrase. I also managed to figure out the Head of the noun phrase.

How is it possible to get the position or even the token/coreLabel of that Head in the sentence?

Even better, how is it possible to find the dependency relationships of the Head to the rest of the sentence?

Here's an example :

public void doSomeTextKarate(String text){

    Properties props = new Properties();
    props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    this.pipeline = pipeline;


    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);
    // run all Annotators on this text
    pipeline.annotate(document);

    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

    for (CoreMap sentence : sentences) {


        SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
        Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
        System.out.println("typedDeps ==>  "+typedDeps);

        SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
        SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);

        List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);

        sentenceTree.percolateHeads(headFinder);
        Set<Dependency<Label, Label, Object> > sentenceDeps =   sentenceTree.dependencies();
        for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
            System.out.println("sentence dep = " + dependency);

            System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
        }


        //find nounPhrases in setence
        TregexPattern pat = TregexPattern.compile("@NP");
        TregexMatcher matcher = pat.matcher(sentenceTree);
        while (matcher.find()) {

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);

            Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
            for (Dependency<Label, Label, Object> dependency : npDeps ) {
                System.out.println("nounPhraseTree  dep = " + dependency);
            }


            Tree head = nounPhraseTree.headTerminal(headFinder);
            System.out.println("head " + head);


            Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
            for (Dependency<Label, Label, Object> dependency : headDeps) {
                System.out.println("head dep " + dependency);
            }


            //QUESTION : 
            //How do I get the position of "head" in tokens or numerizedTokens ?
            //How do I get the dependencies where "head" is involved in typedDeps ? 

        }
    }
}

In other words I would like to query for ALL dependency relationships where the "head" word/token/label is involved in the ENTIRE sentence. So I thought I needed to figure out the position of that token in the sentence to correlate it with the typed dependencies but mybe there is some easier way ?

Thanks in advance.

[EDIT]

So I might have found an answer or the beginning of it.

If I call .label() on head I get myself a CoreLabel which is pretty much what I needed to find the rest. I can now iterate over the typed dependencies and search for dependencies where either the dominator label or dependent label has the same index as my headLabel.

            Tree nounPhraseTree = matcher.getMatch();
            System.out.println("Found noun phrase " + nounPhraseTree);

            nounPhraseTree.percolateHeads(headFinder);
            Tree head = nounPhraseTree.headTerminal(headFinder);
            CoreLabel headLabel = (CoreLabel) head.label();

            System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));

            System.out.println("");
            System.out.println("Iterating over typed deps");
            for (TypedDependency typedDependency : typedDeps) {
                System.out.println(typedDependency.gov().backingLabel());
                System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
                System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());

                if(typedDependency.gov().index() == headLabel.index() ){

                    System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
                            typedDependency.dep().backingLabel().equals(headLabel)));  //why does this return false all the time ? 


                    System.out.println(" !!!!!!!!!!!!!!!!!!!!!  HIT ON " + headLabel + " == " + typedDependency.gov());
                }
            }

So it seems I can only match my head's Label with the one from the typedDeps using the index. I wonder if this the propper way to do this. As you can see in my code I also tried to use TypedDependency.backingLabel() to test equality with my headLabel either with the governor or the dependent but it systematically returns false. I wonder why !?

Any feedback appreciated.


Solution

  • You can get the position of a CoreLabel within its containing sentence with the CoreAnnotations.IndexAnnotation annotation.

    Your method for finding all dependents of a given word seems correct, and is probably the easiest way to do it.