I have been trying to play with the core-nlp co-reference resolution system. The system works as explained in the tutorial. Below is the code for the same:
public static void main(String[] args) throws Exception {
Annotation document = new Annotation("Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
System.out.println("---");
System.out.println("coref chains");
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
System.out.println("\t" + cc);
}
which outputs :
CHAIN3-["Barack Obama" in sentence 1, "He" in sentence 1]
What I am trying to get is a map which shows
Key | Value
He : Barack Obama
Obama: Barack Obama
Is there an inbuilt method to achieve this or do I have to post-process this (Not just the Map)?
At the moment there isn't really code for that. Here is a snippet that will print out the mention gloss, position info, and canonical mention:
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
CorefChain.CorefMention representativeMention = cc.getRepresentativeMention();
for (CorefChain.CorefMention cm : cc.getMentionsInTextualOrder()) {
String position = "sentence num: "+cm.sentNum+" position: "+cm.startIndex;
System.out.println(cm.mentionSpan + "\t" + position + "\t" + representativeMention.mentionSpan);
}
}