Search code examples
azure-cognitive-search

Unexpected results with Azure AI Search fuzzy search and use of a hypen


I am getting more results with using an exact search than using a fuzzy search. This issue arises when I have a search input with a hyphen.

My index has a searchable field called productDescription with the Lucene Standard Analyzer. The index field's details are:

    {
      "name": "productDescription",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": "standard.lucene",
      "normalizer": null,
      "dimensions": null,
      "vectorSearchProfile": null,
      "vectorEncoding": null,
      "synonymMaps": []
    }

I am testing out my search with the index's Search Explorer in the Azure portal before I test anything out in Java.

Here is the query that works as expected.

{
  "search": "agent resistant",
  "queryType": "full",
  "count": true
}

50 results.

{
  "search": "agent~ resistant~",
  "queryType": "full",
  "count": true
}

100 results. This is expected with adding fuzzy search.

The query that I am having trouble with is:

{
  "search": "ultra-wise",
  "queryType": "full",
  "count": true
}

With this, I will get 65 results.

Now when I change it to add fuzzy search,

{
  "search": "ultra-wise~",
  "queryType": "full",
  "count": true
}

I now get 0 results. I have also tried ultra~ wise~ and ultra wise~ which do return results, but my users will be inputting the dash.

Is there something else I should be accounting for when using hyphens?


Solution

  • When you use hyphen(-) without fuzzy search in standard analyzer it will split the words and search each one.

    When you search "ultra-wise" it searches for the word both ultra and wise so, you are getting results 65 results.

    You can see it by adding highlight in the query search.

    Below is the sample search query results.

    query

    {
      "search": "ultra-wise",
      "queryType": "full",
      "count": true,
      "highlight": "productDescription"
    }
    

    results.

    {
      "@odata.context": "https://axsxnsk.search.windows.net/indexes('azureblob-indexjsons')/$metadata#docs(*)",
      "@odata.count": 3,
      "value": [
        {
          "@search.score": 0.6986222,
          "@search.highlights": {
            "productDescription": [
              "<em>Ultra</em> sharp kitchen knives with ergonomic handles."
            ]
          },
          "productDescription": "Ultra sharp kitchen knives with ergonomic handles.",
          "AzureSearch_DocumentKey": "aHR0canNvbjsxOQ2",
          "metadata_storage_name": "output.json"
        },
        {
          "@search.score": 0.62191015,
          "@search.highlights": {
            "productDescription": [
              "Noise cancelling over ear headphones with <em>ultra</em> clear sound."
            ]
          },
          "productDescription": "Noise cancelling over ear headphones with ultra clear sound.",
          "AzureSearch_DocumentKey": "aHR0cHM6Lyjs10",
          "metadata_storage_name": "output.json"
        },
        {
          "@search.score": 0.593085,
          "@search.highlights": {
            "productDescription": [
              "<em>Ultra</em> fast USB C charging cable for modern devices."
            ]
          },
          "productDescription": "Ultra fast USB C charging cable for modern devices.",
          "AzureSearch_DocumentKey": "aHR0cHM6Ly9jsxMA2",
          "metadata_storage_name": "output.json"
        }
      ]
    }
    

    If observe here count is 3 and i got results having the word ultra and no results for wise since in my data, it is not present.

    But, when you do fuzzy search the search is done on combined words (ultrawise).

    See below.

    {
      "search": "ultra-wise~",
      "queryType": "full",
      "count": true,
      "highlight": "productDescription"
    }
    

    Now count is 2 having fuzzy search result.

    {
      "@odata.context": "https://dkdnd.search.windows.net/indexes('azureblob-indexjsons')/$metadata#docs(*)",
      "@odata.count": 2,
      "value": [
        {
          "@search.score": 0.64547044,
          "@search.highlights": {
            "productDescription": [
              "<em>Ultrawise</em> cleaning agent for all surfaces."
            ]
          },
          "productDescription": "Ultrawise cleaning agent for all surfaces.",
          "AzureSearch_DocumentKey": "aHR0cHM6Ly92amdbjsx0",
          "metadata_storage_name": "output.json"
        },
        {
          "@search.score": 0.5647866,
          "@search.highlights": {
            "productDescription": [
              "Premium high resolution <em>ultrawide</em> monitor for gaming."
            ]
          },
          "productDescription": "Premium high resolution ultrawide monitor for gaming.",
          "AzureSearch_DocumentKey": "aHR0cHM6Ly92amdzNvbjs50",
          "metadata_storage_name": "output.json"
        }
      ]
    }
    

    In you documents there no word like ultrawise so zero results.

    The same is for me if i do search on High-strength

    {
      "search": "High-strength",
      "queryType": "full",
      "count": true,
      "highlight": "productDescription"
    }
    

    Got results for each word.

    {
      "@odata.context": "https://qwqwoq.search.windows.net/indexes('azureblob-indexjsons')/$metadata#docs(*)",
      "@odata.count": 2,
      "value": [
        {
          "@search.score": 1.9095788,
          "@search.highlights": {
            "productDescription": [
              "<em>High</em>-<em>strength</em> adhesive for industrial applications."
            ]
          },
          "productDescription": "High-strength adhesive for industrial applications.",
          "AzureSearch_DocumentKey": "aHR0cHM6Ly92amdzYjsy0",
          "metadata_storage_name": "output.json"
        },
        {
          "@search.score": 0.7261542,
          "@search.highlights": {
            "productDescription": [
              "Premium <em>high</em> resolution ultrawide monitor for gaming."
            ]
          },
          "productDescription": "Premium high resolution ultrawide monitor for gaming.",
          "AzureSearch_DocumentKey": "aHR0cHM6Ly92ambjs50",
          "metadata_storage_name": "output.json"
        }
      ]
    }
    

    when i do fuzzy search zero results since i don't have Highstrength kind of word.

    {
      "search": "High-strength~",
      "queryType": "full",
      "count": true,
      "highlight": "productDescription"
    }
    

    Results

    {
      "@odata.context": "https://dflckds.search.windows.net/indexes('azureblob-indexjsons')/$metadata#docs(*)",
      "@odata.count": 0,
      "value": []
    }
    

    If you want to do fuzzy search on both the words separately you need to do query giving ultra~ wise~ or to match exact ultra-wise word search by giving \"ultra-wise\".

    Whatever the user give split it and do fuzzy search.