Search code examples
stanford-nlp

Ignore words for lemmatizer


I would like to use Stanford CoreNLP for lemmatization but I have some words not to be lemmatized. Is there a way to provide this ignore list to the tool? I am following this code, and when the program calls this.pipeline.annotate(document);then, that's it; it would be hard to replace the occurrences. One solution is that create a mapping list in which each word to be ignored is paired with lemmatize(word) (i.e., d = {(w1, lemmatize(w1)), (w2, lemmatize(w2), ...} and do the post processing with this mapping list. But it should be easier than this, I guess.

Thanks for the help.


Solution

  • I think I found the solution with my friend's help.

      for(CoreMap sentence: sentences) {
            // Iterate over all tokens in a sentence
            for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
                System.out.print(token.get(OriginalTextAnnotation.class) + "\t");
                System.out.println(token.get(LemmaAnnotation.class));
    
            }
        }
    

    You can get original form of the word by calling token.get(OriginalTextAnnotation.class).