Search code examples
stanford-nlp

How to use quote annotator


Running

./corenlp.sh -annotators quote -outputFormat xml -file input.txt

on the modified input file

"Stanford University" is located in California. It is a great university, founded in 1891.

yields the following output:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="CoreNLP-to-HTML.xsl" type="text/xsl"?>
<root>
  <document>
    <sentences/>
  </document>
</root>

Maybe I misunderstood the intended use of this annotator, but I expected it to mark the parts of the sentence that is between the ".

When I run the script with the "usual" annotators tokenize,ssplit,pos,lemma,ner, they are all working well, but adding quote does not change the output. I use the stanford-corenlp-full-2015-12-09 release. How can I use the quote annotator and what is it meant to do?


Solution

  • If you build a StanfordCoreNLP object in Java code and run it with the quote annotator, the final Annotation object will have the quotes.

    import java.io.*;
    import java.util.*;
    import edu.stanford.nlp.io.*;
    import edu.stanford.nlp.ling.*;
    import edu.stanford.nlp.pipeline.*;
    import edu.stanford.nlp.trees.*;
    import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
    import edu.stanford.nlp.semgraph.*;
    import edu.stanford.nlp.ling.CoreAnnotations.*;
    import edu.stanford.nlp.util.*;
    
    public class PipelineExample {
    
        public static void main (String[] args) throws IOException {
            // build pipeline
            Properties props = new Properties();
            props.setProperty("annotators","tokenize, ssplit, quote");
            StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
            String text = "\"Stanford University\" is located in California. It is a great university, founded in 1891.";
            Annotation annotation = new Annotation(text);
            pipeline.annotate(annotation);
            System.out.println(annotation.get(CoreAnnotations.QuotationsAnnotation.class));
        }
    }
    

    Currently none of the outputters (json, xml, text, etc...) output the quotes. I'll make a note we should add this to the output for future versions.