I am looking for help on how I can use the class PorterStemFilter in Lucene 4.0. Below is my indexer taken from http://www.lucenetutorial.com/lucene-in-5-minutes.html:
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
IndexWriter w = new IndexWriter(index, config);
addDoc(w, "Lucene in Action", "193398817");
addDoc(w, "Lucene for Dummies", "55320055Z");
Could someone help me with where and how to use the PorterStemFilter class
Filters are generally incorporated into an Analyzer. To create you own Analyzer, the only thing you really need to override is the TokenStream
If you just want to chuck a the stem filter into StandardAnalyzer, I would copy the implementation of tokenStream
from StandardAnalyzer, and add the filter at the appropriate location (with stemmers, usually they should be added late in the filter chain).
public TokenStream tokenStream(String fieldName, Reader reader) {
StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_46, reader);
TokenStream result = new StandardFilter(tokenStream);
result = new LowerCaseFilter(result);
result = new StopFilter(true, result, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
//Adding the StemFilter here
result = new PorterStemFilter(result);
return result;
Alternatively, you could just use EnglishAnalyzer
(among other languages), which already has a stemmer.