Search code examples
c#azure-cognitive-searchazure-search-.net-sdk

Azure Cognitive Search - How to filter by fields containing special characters


We are using the Azure Cognitive Search .NET SDK and are trying to $filter by a string field that can contain Search-special characters such as &, as well as single quotes.

We get zero results when filtering against a test case with the kitchen sink of special characters (we excluded | since it's our separator for search.in):

{
  "FirmName": "Crazy Charz Inc. ' + - && ! ( ) { } [ ] ^ \" ~ * ? : \\ /"
  ...
}

When we escape the special characters with \ as asked about here and recommended here, and the single quote by double-quoting it '' (as revealed in this answer, not in the SDK docs), we get zero results.

The Filter in our SearchParameters object is set to:

search.in(FirmName, 'Crazy Charz Inc. '' \+ \- \&\& \! \( \) \{ \} \[ \] \^ \" \~ \* \? \: \\ \/', '|')

(That's how it looks when inspecting the variable in VS; it should be properly escaped.)

We get zero results back.

We've confirmed it's specific to the special characters, because we have plenty of tests with the same field matching other docs that contain no such chars in their value.

Out of curiosity, we tried running it in Search Explorer like this:

$filter=search.in(FirmName, 'Crazy Charz Inc. '' \+ \- \&\& \! \( \) \{ \} \[ \] \^ \" \~ \* \? \: \\ \/', '|')

When we do so, we get the error:

"Invalid expression: Found an unbalanced bracket expression.\r\nParameter name: $filter"

We've confirmed that the SDK returns an actual zero-results response, not an error (we put an actual unbalanced expression in the filter expression to confirm this).

How can we $filter on values with special chars using the .NET SDK? Is this a bug, or are we doing something wrong?

Note: We are providing a pick list of options and doing an exact match; hence filter, and not search, for this use case. We'll be adding search on other fields later.

Do we need to simply URLEncode all our fields? Ugh...


Solution

  • The issue is that you're using an encoding scheme for a different syntax than the one you want.

    There are three query syntaxes in Azure Cognitive Search, each with its own encoding rules:

    1. Simple query syntax (used in the search parameter; encoding rules described in the docs to which you linked)
    2. Full Lucene query syntax (also used in search, more or less a superset of the Simple query syntax)
    3. OData syntax (used in $filter, $select, and $orderby; documented here).

    The rule about doubling single quotes comes from OData. The other rules you're applying are for Simple query syntax, not OData.

    I wrote a small console app to test this, and I was able to match the expected document using this exact string literal:

    @"search.in(hotelName, 'Crazy Charz Inc. '' + - && ! ( ) { } [ ] ^ "" ~ * ? : \ /', '|')"
    

    Note that because I'm using a verbatim string, only the quotes need to be escaped (single quote for OData, double quote for the compiler).