I am not able to split sentences on \n or \r using the Stanford NLP WordsToSentencesAnnotator. I am just trying to use the code as described in here: http://nlp.stanford.edu/software/sutime.shtml, but I am using custom splitter
public static void main(String[] args) {
Properties props = new Properties();
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new PTBTokenizerAnnotator(false));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false,"\n"));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
...
I am using version 1.3.5 of the corenlp jar. I also tried using \r, \r\n etc. in place of \n, but nothing seems to be working. Any help?
Well, that is not the way I would build a pipeline, but have you tried
WordsToSentencesAnnotator newlineSplitter(false, "\n");
So, I would try something more like:
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
to interact with the pipeline. "SUTime annotations are provided automatically with the StanfordCoreNLP pipeline by including the ner annotator" according to the Stanford NLP page and therefore you should able to accomplish the same thing. Your sentence splitting annotator is ssplit. The following options are available for ssplit (once again taken from the Stanford NLP page):