Search code examples
azuresearchlucenefull-text-searchazure-cognitive-search

How to search special characters and words with special characters in azure search?


I am using azure search with standard analyzer. I am facing this issue when I am doing a search.

I have texts in my index like "abc@sakiladb.com".

  1. If I try searching just @ , No results.

  2. If I try searching using half of this word followed by * , No results.

  3. If I try searching with escaping and encoding also both of the above case don't work.

Is there any way I can search these strings?


Solution

  • Looks like this might be something with the standard analyzer. If use the analyze API https://learn.microsoft.com/en-us/rest/api/searchservice/test-analyzer with the standard analyzer you will see that it only tokenizes "abc" and "sakiladb.com" as shown below.

        "tokens": [
            {
                "token": "abc",
                "startOffset": 0,
                "endOffset": 3,
                "position": 0
            },
            {
                "token": "sakiladb.com",
                "startOffset": 4,
                "endOffset": 16,
                "position": 1
            }
        ]
    }
    

    While with the "en.microsoft" analyzer the tokenization happens differently, it creates a token for the whole "abc@sakiladb.com" which now should return you the desired results for your searches. All this is backed up by the documentation as here it is referred that the standard analyzer will just ignore most of the special characters https://learn.microsoft.com/en-us/azure/search/query-simple-syntax#special-characters

    "tokens": [
            {
                "token": "abc@sakiladb.com",
                "startOffset": 0,
                "endOffset": 16,
                "position": 0
            },
            {
                "token": "abc",
                "startOffset": 0,
                "endOffset": 3,
                "position": 0
            },
            {
                "token": "sakiladb",
                "startOffset": 4,
                "endOffset": 12,
                "position": 1
            },
            {
                "token": "com",
                "startOffset": 13,
                "endOffset": 16,
                "position": 2
            }
        ]
    }