Is there a way to process an already POS-tagged text using Stanford CoreNLP?
For example, I have the sentence in this format
They_PRP are_VBP hunting_VBG dogs_NNS ._.
and I'd like to annotate with lemma, ner, parse, etc. by forcing the given POS annotation.
Update. I tried this code, but it's not working.
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String sentText = "They_PRP are_VBP hunting_VBG dogs_NNS ._.";
List<CoreLabel> sentence = new ArrayList<>();
String[] parts = sentText.split("\\s");
for (String p : parts) {
String[] split = p.split("_");
CoreLabel clToken = new CoreLabel();
clToken.setValue(split[0]);
clToken.setWord(split[0]);
clToken.setOriginalText(split[0]);
clToken.set(CoreAnnotations.PartOfSpeechAnnotation.class, split[1]);
sentence.add(clToken);
}
Annotation s = new Annotation(sentText);
s.set(CoreAnnotations.TokensAnnotation.class, sentence);
Annotation document = new Annotation(s);
pipeline.annotate(document);
The POS annotations will certainly be replaced if you include the pos
annotator in the pipeline.
Instead, remove the pos
annotator and add the option -enforceRequirements false
. This will allow the pipeline to run even though an annotator which lemma
, etc. depend on (the pos
annotator) is not present. Add the following line before pipeline instantiation:
props.setProperty("enforceRequirements", "false");
Of course, behavior is undefined if you venture into this area without setting the proper annotations, so make sure you match the annotations made by the relevant annotator (POSTaggerAnnotator
in this case).