Search code examples
lucenelucene.netfuzzy-search

How can I perform a fuzzy search for all words provided in a Lucene.net search


I am trying to teach myself Lucene.Net to implement on my site. I understand how to do almost everything I need except for one issue. I am trying to figure out how to allow a fuzzy search for all search terms in a search string.

So for example if I have a document with the string The big red fox, I am trying to get bag fix to match it.

The problem is, it seems like in order to perform fuzzy searches, I have to add ~ to every search term the user enters. I am unsure of the best way to go about this. Right now I am attempting this by

string queryString = "bag rad";
queryString = queryString.Replace("~", string.Empty).Replace(" ", "~ ") + "~";

The first replace is due to Lucene.Net throwing an exception if the search string has a ~ already, apparently it can't handle ~~ in a phrase. This method works, but it seems like it will get messy if I start adding fuzzy weight values.

Is there a better way to default all words to allow for fuzzyness?


Solution

  • You might want to index your documents as bi-grams or tri-grams. Take a look at the CJKAnalyzer to see how they do it. You will want to download the source and look at the source.