I would like to use Stanford CoreNLP for lemmatization but I have some words not to be lemmatized. Is there a way to provide this ignore list to the tool? I am following this code, and when the program calls this.pipeline.annotate(document);
then, that's it; it would be hard to replace the occurrences. One solution is that create a mapping list in which each word to be ignored is paired with lemmatize(word) (i.e., d = {(w1, lemmatize(w1)), (w2, lemmatize(w2), ...} and do the post processing with this mapping list. But it should be easier than this, I guess.
Thanks for the help.
I think I found the solution with my friend's help.
for(CoreMap sentence: sentences) {
// Iterate over all tokens in a sentence
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
System.out.print(token.get(OriginalTextAnnotation.class) + "\t");
System.out.println(token.get(LemmaAnnotation.class));
}
}
You can get original form of the word by calling token.get(OriginalTextAnnotation.class)
.