Search code examples
javalucenetokenizelowercase

How to apply LowerCase to a String using Lucene


I'm starting to work with Apache Lucene 8.0. I would want to know how to convert my String text variable into lowercase using Lucene. I'm not really sure about how to do it because I couldn't find any examples. What I want would be something like this:

public class DocumentLowercase {

private Analyzer analyzer; 

public Analyzer DocAnalysis(Document d) {

    analyzer = new StandardAnalyzer();
    String text = d.text();

    **Here convert String Text into lowercase**
    ** maybe using Lower Case Tokenizer? but how? **        

    return analyzer;


}
}

Solution

  • StandardAnalyzer already converts everything to lower case!

    Check the docs here: http://lucene.apache.org/core/8_0_0/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html

    They say:

    Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.

    You can also see in the source code which components a StandardAnalyzer includes:

      @Override
      protected TokenStreamComponents createComponents(final String fieldName) {
        final StandardTokenizer src = new StandardTokenizer();
        src.setMaxTokenLength(maxTokenLength);
        TokenStream tok = new LowerCaseFilter(src);
        tok = new StopFilter(tok, stopwords);
        return new TokenStreamComponents(r -> {
          src.setMaxTokenLength(StandardAnalyzer.this.maxTokenLength);
          src.setReader(r);
        }, tok);
      }
    

    If you want to customize your analyzer anyway you should look into CustomAnalyzer.