Search code examples
javastanford-nlpnlp

Coreference Resolution with CoreNLP


I am trying to get CoreNLP to access CorefChains. My intention is that words like "he, she, ..." will be substituted by their best mention, but I am not able to access the CorefChains (they are always null).

    public static void main (String [] args) {
         Properties props = new Properties();
         props.put("annotators", "tokenize,ssplit,pos,lemma,ner,parse,dcoref");
         props.put("dcoref.score", true);
         StanfordCoreNLP corefPipeline = new StanfordCoreNLP(props);
         String text = "Barack Obama was born in Hawaii.  He is the president. Obama was elected in 2008.";
         Annotation document = new Annotation(text);
         corefPipeline.annotate(document);
         // Chains is always null
         Map<Integer, CorefChain> chains = document.get(CorefCoreAnnotations.CorefChainAnnotation.class);
}

Solution

  • I think it is an import classes issue. This one is working fine:

    import java.util.Map;
    import java.util.Properties;
    
    import edu.stanford.nlp.coref.CorefCoreAnnotations;
    import edu.stanford.nlp.coref.data.CorefChain;
    import edu.stanford.nlp.pipeline.Annotation;
    import edu.stanford.nlp.pipeline.StanfordCoreNLP;
    
    
    public class App {
        public static void main(String[] args) {
            Properties props = new Properties();
            props.put("annotators", "tokenize,ssplit,pos,lemma,ner,parse,dcoref");
            props.put("dcoref.score", true);
            StanfordCoreNLP corefPipeline = new StanfordCoreNLP(props);
            String text = "Barack Obama was born in Hawaii.  He is the president. Obama was elected in 2008.";
            Annotation document = new Annotation(text);
            corefPipeline.annotate(document);
            // Chains is always null
            Map<Integer, CorefChain> chains = document.get(CorefCoreAnnotations.CorefChainAnnotation.class);
            System.out.println(chains);
        }
    }
    

    And the output:

    {1=CHAIN1-["Barack Obama" in sentence 1, "He" in sentence 2, "the president" in sentence 2, "Obama" in sentence 3], 2=CHAIN2-["Hawaii" in sentence 1], 6=CHAIN6-["2008" in sentence 3]}
    

    Here is what I have in pom.xml:

    <dependencies>
        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.9.2</version>
        </dependency>
        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.9.2</version>
            <classifier>models</classifier>
        </dependency>
    </dependencies>