Search code examples
luceneindexingstemmingporter-stemmer

PorterStemmer in Lucene


I am looking for help on how I can use the class PorterStemFilter in Lucene 4.0. Below is my indexer taken from http://www.lucenetutorial.com/lucene-in-5-minutes.html:

...

  StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
  Directory index = new RAMDirectory();
  IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);

  IndexWriter w = new IndexWriter(index, config);
  addDoc(w, "Lucene in Action", "193398817");
  addDoc(w, "Lucene for Dummies", "55320055Z");

......

Could someone help me with where and how to use the PorterStemFilter class


Solution

  • Filters are generally incorporated into an Analyzer. To create you own Analyzer, the only thing you really need to override is the TokenStream method.

    If you just want to chuck a the stem filter into StandardAnalyzer, I would copy the implementation of tokenStream from StandardAnalyzer, and add the filter at the appropriate location (with stemmers, usually they should be added late in the filter chain).

    @Override
    public TokenStream tokenStream(String fieldName, Reader reader) {
        StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_46, reader);
        tokenStream.setMaxTokenLength(255);
        TokenStream result = new StandardFilter(tokenStream);
        result = new LowerCaseFilter(result);
        result = new StopFilter(true, result, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
        //Adding the StemFilter here
        result = new PorterStemFilter(result);
        return result;
    }
    

    Alternatively, you could just use EnglishAnalyzer (among other languages), which already has a stemmer.