Search code examples
javaluceneinformation-retrieval

What does StandardFilter exactly do in Lucene5.3.1?


I didn't find any example in the documentation. It just says: "Normalizes tokens extracted with StandardTokenizer.".

What does documentation mean with: "Normalizes"?


Solution

  • According to the API documentation:

    Normalizes tokens extracted with StandardTokenizer.

    In reality, though, the answer is: Absolutely nothing.

    public class StandardFilter extends TokenFilter {
      public StandardFilter(TokenStream in) {
        super(in);
      }
    
      @Override
      public final boolean incrementToken() throws IOException {
        return input.incrementToken(); // TODO: add some niceties for the new grammar
      }
    }
    

    That's about as simple as a TokenFilter gets. It takes in tokens, and spits them right back out again, unchanged.

    In Lucene 2.X it did some work on apostrophes, removing dots from acronyms, etc, and in 3.X and 4.X, it kept that code around for backward compatibility. As of 5.0 that backwards-comptability support has been removed, and the filter no longer does anything at all (though it certainly may in the future).