Search code examples
searchsolrlucenefull-text-search

LUCENE - Fuzzy Search on a word containing Space


The case I am facing seems very simple, but truly I can't imagine a clear solution:

  • Imagine I want to indexed a text containing "Summertime, and the living is easy" on a Lucene Index.

  • I want that the search on my UI of "summer time" finds the document indexed containing my text with Summertime, while maintaining all the benefits of a StandardAnalyser standard data.

I imagine that using a fuzzyQuery will suffice (since the distance is 1). since the tokenizer I use split based on the spaces, the solution isn't relevant I don't know which analyzer to use to allow this possibility? while keeping all the benefits of a StandardAnalyzer'like (Stopwords, possibility to add synonyms,...).

Maybe it's simpler than I think (at least it seems so), but I really can't see any solution for now.


Solution

  • You can use a ShingleFilter to make Solr combine multiple tokens into one, with a user define separator.

    That way you'll get "summer time" as a single token, as well as "summer" and "time" (unless you disable outputUnigrams). When you do this you'll get tokens with a small edit distance, and the fuzzy search should work as you want it to.