Search code examples
javastanford-nlp

core-nlp coreference resolution: remaping co-references


I have been trying to play with the core-nlp co-reference resolution system. The system works as explained in the tutorial. Below is the code for the same:

public static void main(String[] args) throws Exception {
    Annotation document = new Annotation("Barack Obama was born in Hawaii.  He is the president. Obama was elected in 2008.");
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);
    System.out.println("---");
    System.out.println("coref chains");
    for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
      System.out.println("\t" + cc);
    }

which outputs :

CHAIN3-["Barack Obama" in sentence 1, "He" in sentence 1]

What I am trying to get is a map which shows

Key | Value
He : Barack Obama
Obama: Barack Obama

Is there an inbuilt method to achieve this or do I have to post-process this (Not just the Map)?


Solution

  • At the moment there isn't really code for that. Here is a snippet that will print out the mention gloss, position info, and canonical mention:

    for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
        CorefChain.CorefMention representativeMention = cc.getRepresentativeMention();
        for (CorefChain.CorefMention cm : cc.getMentionsInTextualOrder()) {
          String position = "sentence num: "+cm.sentNum+" position: "+cm.startIndex;
          System.out.println(cm.mentionSpan + "\t" + position + "\t" + representativeMention.mentionSpan);
    }
    

    }