Search code examples
javalucenestemming

Indonesian Stemmer Using Lucene


Here is class from Lucene library that I want to take advantage (make use) of.. But I don't know how to use/implement that library in Java..

Example: I have string array >> menjadikan, menjawab, penerbangan

Can you help me in Java with creating such an array??


Solution

  • Here is an example code snippet (based on the Lucene test code) that creates a Lucene analyser using the Indonesian stemmer.

    import java.io.IOException;
    import java.io.Reader;
    
    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.Tokenizer;
    import org.apache.lucene.analysis.core.KeywordTokenizer;
    
    
      ...
      Analyzer a = new Analyzer() {
        @Override
        public TokenStreamComponents createComponents(
                   String fieldName, Reader reader) {
          Tokenizer tokenizer = new KeywordTokenizer(reader);
          return new TokenStreamComponents(tokenizer, 
                     new IndonesianStemFilter(tokenizer));
        }
      };
    

    You could also instantiate IndonesianStemmer directly, and call the stem method on individual words. For example;

      IndonesianStemmer stemmer = new IndonesianStemmer();
      ...
      char[] chars = "menjadikan".toCharArray();
      int len = stemmer.stem(chars, chars.length, false);
      String stem = new String(chars, 0, len);
    

    WARNING: the above code is not tested.