Search code examples
c#azure-cognitive-search

Azure search not working with special characters #


I need to search in a caption field (string) all records that start with hashtag #sun. (e.g. #sun, #sunny ,ect.)

I'm using Azure Search SDK in .Net

Below is an extract of the code:

var tag="#sun"
var searchTerm = tag + "*";            


 var searchResult = await searchClient.SearchAsync<MyObject>(searchTerm, searchOptions);

If I remove the #, it works, but it gets anything that starts with the sun, even if it's not a hashtag.

I have also tried to encode "#" as specified in the guide.

"%23sun*" instead of "#sun*"

Still not working


Solution

  • I asked in a comment to your original question what are the analyzers you are using. Based on the behavior you described, I suspect you are using an analyzer that strips out special characters (such as #). This means that at indexing time, the term "#sun" gets indexed as the token "sun". However, at query time, you are using the wildcard symbol ("#sun*"). Terms that use wildcards are not analyzed. More info here: https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax#impact-of-an-analyzer-on-wildcard-queries

    Make sure to use an analyzer that preserve all the characters of importance in your document (such as # in your case). You can test various analyzer using this endpoint: https://learn.microsoft.com/en-us/rest/api/searchservice/test-analyzer

    For example, if you want to keep the whole document field as a single token, you could use a keyword analyzer. If you want to only break on whitespace, then you could use a whitespace analyzer. Here's more info on how to best use analyzers: https://learn.microsoft.com/en-us/azure/search/search-analyzers