Search code examples
lucenelevenshtein-distancefuzzy-search

Find typo with Lucene


I would like to use Lucene to index/search text. The text can contain mistyped words, names, etc. What is the most simple way of getting Lucene to find a document containing

"this is Licene" 

when user searches for

"Lucene"? 

This is only for a demo app, so we need the most simple solution.


Solution

  • Lucene's fuzzy queries and based on Levenshtein edit distance.

    Use a fuzzy query in the QueryParser, with syntax like:

    Lucene~0.5
    

    Or create a FuzzyQuery, passing in the maximum number of edits, something like:

    Query query = new FuzzyQuery(new Term("field", "lucene"), 1);
    

    Note: FuzzyQuery, in Lucene 4.x, does not support greater edit distances than 2.