I am using Stanford coreNLP to parse some text. I get multiple sentences. On these sentences I managed to extract Noun Phrases using TregexPattern. So I get a child Tree that is my Noun Phrase. I also managed to figure out the Head of the noun phrase.
How is it possible to get the position or even the token/coreLabel of that Head in the sentence?
Even better, how is it possible to find the dependency relationships of the Head to the rest of the sentence?
Here's an example :
public void doSomeTextKarate(String text){
Properties props = new Properties();
props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
this.pipeline = pipeline;
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
System.out.println("typedDeps ==> "+typedDeps);
SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
sentenceTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > sentenceDeps = sentenceTree.dependencies();
for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
System.out.println("sentence dep = " + dependency);
System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
}
//find nounPhrases in setence
TregexPattern pat = TregexPattern.compile("@NP");
TregexMatcher matcher = pat.matcher(sentenceTree);
while (matcher.find()) {
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
for (Dependency<Label, Label, Object> dependency : npDeps ) {
System.out.println("nounPhraseTree dep = " + dependency);
}
Tree head = nounPhraseTree.headTerminal(headFinder);
System.out.println("head " + head);
Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
for (Dependency<Label, Label, Object> dependency : headDeps) {
System.out.println("head dep " + dependency);
}
//QUESTION :
//How do I get the position of "head" in tokens or numerizedTokens ?
//How do I get the dependencies where "head" is involved in typedDeps ?
}
}
}
In other words I would like to query for ALL dependency relationships where the "head" word/token/label is involved in the ENTIRE sentence. So I thought I needed to figure out the position of that token in the sentence to correlate it with the typed dependencies but mybe there is some easier way ?
Thanks in advance.
[EDIT]
So I might have found an answer or the beginning of it.
If I call .label() on head I get myself a CoreLabel which is pretty much what I needed to find the rest. I can now iterate over the typed dependencies and search for dependencies where either the dominator label or dependent label has the same index as my headLabel.
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Tree head = nounPhraseTree.headTerminal(headFinder);
CoreLabel headLabel = (CoreLabel) head.label();
System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));
System.out.println("");
System.out.println("Iterating over typed deps");
for (TypedDependency typedDependency : typedDeps) {
System.out.println(typedDependency.gov().backingLabel());
System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());
if(typedDependency.gov().index() == headLabel.index() ){
System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
typedDependency.dep().backingLabel().equals(headLabel))); //why does this return false all the time ?
System.out.println(" !!!!!!!!!!!!!!!!!!!!! HIT ON " + headLabel + " == " + typedDependency.gov());
}
}
So it seems I can only match my head's Label with the one from the typedDeps using the index. I wonder if this the propper way to do this. As you can see in my code I also tried to use TypedDependency.backingLabel() to test equality with my headLabel either with the governor or the dependent but it systematically returns false. I wonder why !?
Any feedback appreciated.
You can get the position of a CoreLabel within its containing sentence with the CoreAnnotations.IndexAnnotation
annotation.
Your method for finding all dependents of a given word seems correct, and is probably the easiest way to do it.