Search code examples
azure-cognitive-searchazure-cognitive-services

How to search a term with a middle dash in azure search?


I'm learning to use azure search and I dont find a way to search a term with a middle dash into the ItemId field, doesn't care if the term to search it's at the beginning or at the middle.

I have these fields with data in my index

+-----+--------------------+-------------+
| Cat |       ItemId       | Description |
+-----+--------------------+-------------+
| 100 |  400800-1100103U   | desc item 1 |
| 100 |  400800-11001066   | desc item 2 |
| 100 |  400800-11001068   | desc item 3 |
| 101 |  400800-110010F6   | desc item 4 |
+-----+--------------------+-------------+

This is my index field configuration:

+-------------+-------------+-----------+-----------+-----------+------------+
| Field Name  | Retrievable | Filerable |  Sortable | Facetable | Searchable |
+-------------+-------------+-----------+-----------+-----------+------------+
| Cat         |    OK       |    OK     |    OK     |    OK     |    X       |
| ItemId      |    OK       |    OK     |    OK     |    OK     |    OK      |
| Description |    OK       |           |           |           |            |
+-------------+-------------+-----------+-----------+-----------+------------+

And this is my custom analyzer to the field ItemId to generate just one token even if has a middle dash.

{
  "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
  "name": "keyword_lowercase",
  "tokenizer": "keyword_v2",
  "tokenFilters": [
    "lowercase"
  ],
  "charFilters": []
}

If I search with this query: $select=RowKey&search=400800-1100*

I get these results:

  • 400800-1100103U
  • 400800-11001066
  • 400800-11001068
  • 400800-110010F6

But if I try to search with a middle term like this: $select=RowKey&search=RowKey:(00800-1100*)~

I get 0 results.

So how can I search a term with a middle dash into the ItemId, doesn't care if the term to search it's at the beginning or at the middle?


Solution

  • I believe that this post answers your question by using regular expression search but has some considerations. Alternatively you can consider using fuzzy search or use the Edge N-gram tokenizer with a reverse token filter depending on your specific scenario.