Running
./corenlp.sh -annotators quote -outputFormat xml -file input.txt
on the modified input file
"Stanford University" is located in California. It is a great university, founded in 1891.
yields the following output:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="CoreNLP-to-HTML.xsl" type="text/xsl"?>
<root>
<document>
<sentences/>
</document>
</root>
Maybe I misunderstood the intended use of this annotator, but I expected it to mark the parts of the sentence that is between the ".
When I run the script with the "usual" annotators tokenize,ssplit,pos,lemma,ner, they are all working well, but adding quote does not change the output. I use the stanford-corenlp-full-2015-12-09 release. How can I use the quote annotator and what is it meant to do?
If you build a StanfordCoreNLP object in Java code and run it with the quote annotator, the final Annotation object will have the quotes.
import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.ling.CoreAnnotations.*;
import edu.stanford.nlp.util.*;
public class PipelineExample {
public static void main (String[] args) throws IOException {
// build pipeline
Properties props = new Properties();
props.setProperty("annotators","tokenize, ssplit, quote");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "\"Stanford University\" is located in California. It is a great university, founded in 1891.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
System.out.println(annotation.get(CoreAnnotations.QuotationsAnnotation.class));
}
}
Currently none of the outputters (json, xml, text, etc...) output the quotes. I'll make a note we should add this to the output for future versions.