azure search lucene odata azure-search-.net-sdk

Returning accented as well as normal result set via azure search filters

Does anyone know how to ensure we can return normal result as well as accented result set via the azure search filter. For e.g the below filter query in Azure search returns a name called unicorn when i check for record with name unicorn.

 var result= searchServiceClient.Documents.SearchAsync<myDto>("*",new SearchParameters
            {
                SearchFields = new List<string> {"Name"},
                Filter = "Name eq 'unicorn'"
            });

This is all good but what i want is i want to write a filter such that it returns record named unicorn as well as record named únicorn (please note the first accented character) provided that both record exist.

This can be achieved when searching for such name via the search query using language or Standard ASCII folding search analyzer as mentioned in this link. What i am struggling to find out is how can we implement the same with azure filters?

Please let me know if anyone has got any solutions around this.

Solution

Filters are applied on the non-analyzed representation of the data, so I don’t think there’s any way to do any type of linguistic analysis on filters. One way to work around this is to manually create a field which only do lowercasing + asciifloding (no tokenization) and then search lucene queries that look like this:

    "normal search query terms" AND customFilterColumn:"filtérValuèWithÄccents"

Basically the document would both need to match the search terms in any field AND also match the filter term in the “customFilterColumn”. This may not be sufficient for your needs, but at least you understand the art of the possible.