Search code examples
javalucenestemmingporter-stemmer

Stemming English words with Lucene


I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit".

The function looks like:

String stemTerm(String term){
   ...
}

I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html

Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer business...

EDIT: I actually need a stemming + lemmatization. Can Lucene do this?


Solution

  • import org.apache.lucene.analysis.PorterStemmer;
    ...
    String stemTerm (String term) {
        PorterStemmer stemmer = new PorterStemmer();
        return stemmer.stem(term);
    }
    

    See here for more details. If stemming is all you want to do, then you should use this instead of Lucene.

    Edit: You should lowercase term before passing it to stem().