I have a resource where i know exactly the types of words. i have to lemmatize them but for correct results, i have to manually tag them. i could not find any code for manual tagging of words. i m using following code but it returns wrong result. i.e "painting" for "painting" where i expect "paint".
*//...........lemmatization starts........................
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";
Annotation document = pipeline.process(text);
List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);
for(edu.stanford.nlp.util.CoreMap sentence: sentences)
{
for(CoreLabel token: sentence.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
System.out.println("lemmatized version :" + lemma);
}
}
//...........lemmatization ends.........................*
i have to run lemmatizer on words and not sentences where pos tagging will be done automatically. so i would first manually tag the words and then find their lemma. help with some sample code or reference to some site would be great help.
If you know the POS tags in advance you can get the lemmata the following way:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";
Morphology morphology = new Morphology();
Annotation document = pipeline.process(text);
List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);
for(edu.stanford.nlp.util.CoreMap sentence: sentences) {
for(CoreLabel token: sentence.get(TokensAnnotation.class)) {
String word = token.get(TextAnnotation.class);
String tag = ... //get the tag for the current word from somewhere, e.g. an array
String lemma = morphology.lemma(word, tag);
System.out.println("lemmatized version :" + lemma);
}
}
In case you only want to get the lemma of a single word, you don't even have to run CoreNLP for tokenizing and sentence-splitting, so you could just call the lemma function as following:
String tag = "VBG";
String word = "painting";
Morphology morphology = new Morphology();
String lemma = morphology.lemma(word, tag);