Search code examples
c#luceneazure-cognitive-searchfuzzy-searchazure-search-.net-sdk

Fuzzy search using Lucene with Azure Search .NET SDK


I am trying to use Fuzzy search in combination with partial search and match boosting, using the Azure Search .NET API.

This is what I currently have, it doesn't work yet:

// Create SearchIndexClient
searchIndexClient= new SearchIndexClient("searchServiceName", "indexName", [credentials]);
// Set search params
var searchParameters = new SearchParameters(
                includeTotalResultCount: true,
                queryType: QueryType.Full);
// Set search string
string searchText = "elise*~^10";
// perform search.
var result = searchIndexClient.Documents.SearchAsync(searchText, searchParameters);

There is an entry in that index with a property Name with value 'Elyse'. This entry is not found using the above code. If i change the searchText to "elyse~", the entry does get returned.

I also could not get this to work in the Azure web portal search explorer (does that thing have a name?).

What am I missing here? I think it may be an issue with escaping, but I am not sure how to fix it. I looked at a bunch of documentation and Stack Overflow questions on the topic, but none showed a complete answer on how to make a fuzzy search call using the .NET SDK. So please respond in the form of complete code if possible. Many thanks in advance.


Solution

  • I haven't compiled your application code but it looks correct. The issue here is that wildcard queries don't work with fuzzy operator as you are expecting it to work here.

    There is a note in the documentation that says:

    You cannot use a * or ? symbol as the first character of a search. No text analysis is performed on wildcard search queries. At query time, wildcard query terms are compared against analyzed terms in the search index and expanded.

    This means that specifying a fuzzy operator after a wildcard doesn't have any affect and the result is the same as not applying it. In your example, elise*~^10 is effectively elise*^10 and therefore doesn't match "elyse".

    One way to express this as in a query is to use OR operator. elise~^10 OR elise*^10. This will return the doc containing "elyse" because of the 1st clause.