Search code examples
lucenefull-text-searchazure-cognitive-searchfull-text-indexing

Cognitive search wild card search with special characters


We are using cognitive search for a search requirement, and i am unable to do a wildcard search on a field that has special characters.

e.g., If the name field on a document has value - asdf, I am able to use search Text as* and am getting this document.

However, If the value of this name field on this document is !asdf, I am not able to search for it using wild card. I tried searching using the terms !as*, \!as*, and /\!as*/. This only works when i do !asdf.

Unable to get the wild card search to work when there are special characters in the field. I am using query type full


Solution

  • If you look at the documentation for Full Lucene mode, you'll see that the exclamation character (!) is considered a special character and it must be escaped with a backslash.

    https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax#escaping-special-characters

    Special characters that require escaping include the following: + - & | ! ( ) { } [ ] ^ " ~ * ? : \ /

    But, this alone won't work for any property if you use the Default analyzer. You can test the analyzer directly via REST. If we analyze the string !asdf using the standard analyzer, we see that the output is asdf.

        "tokens": [
        {
            "token": "asdf",
            "startOffset": 1,
            "endOffset": 5,
            "position": 0
        }
    ]
    

    If you want the exclamation character to be included in your index, you have to use an analyzer that do not strip the !-character. See How to specify analyzers for info on defining an analyzer per property.

    To find a suitable analyzer, you can refer to the Predefined Analyzers Reference. Here you'll find a list of preconfigured analyzers. For this particular case, the keyword analyzer will work.

    Treats the entire content of a field as a single token. This is useful for data like zip codes, IDs, and some product names.

    We can test by uploading two test items. One with a title of "asdf" and another with "!asdf".

    {
    "value": [
        {
            "@search.action": "mergeOrUpload",
            "Id": "1",
            "Title": "asdf"
        },
        {
            "@search.action": "mergeOrUpload",
            "Id": "2",
            "Title": "!asdf"
        }
    ]
    

    }

    Then we query for

    \!as* 
    

    (note the escaped ! character) and get 1 document in the result as expected:

        "@odata.count": 1,
        "value": [
        {
            "@search.score": 1.0,
            "Id": "2",
            "Title": "!asdf"
        }