Search code examples
javaindexinglucenelemmatization

how to use lucene for lemmatization and elimination of empty French words


i'm looking for how lemmatizate and eliminate empty words from documents written in French using lucene in java language i looked in the internet but i didn't find good tutoriels .


Solution

  • It's easy, all what you need is a FrenchAnalyzer like this:

    IndexWriterConfig conf= new IndexWriterConfig (Version.LUCENE_45,new FrenchAnalyzer(Version.LUCENE_45,FrenchAnalyzer.getDefaultStopSet()));
    

    and for empty words we use : FrenchAnalyzer.getDefaultStopSet() like i did in the previous code , and for the lemmatization it's already integrated in this analyzer and you can notice that when you look for the important words (by tf idf) .