Search code examples
azureazure-cognitive-searchfuzzy-search

Fuzzy Search Behaving Stricter Than Expected


I am querying my own azure search instance using the post-body REST endpoint POST https://[search].search.windows.net/indexes/[index]/docs/search?api-version=2019-05-06 and am having trouble understanding why my fuzzy searches sometimes do not find any results.

I have one document that I am trying to look up:

{
  "Company": "MyCompany",
  "City": "Houston",
  "Country": "United States",
}

If I do a search query like "search": "City:Houston" then I get the document as expected. Similarly, if I search with "search": "City:Housto~" then I also get the document.

However, "search": "City:Houst~" does not find ANY documents, I have tried variations of "search": "City:Houst~X" for X between 0.1-1.0 and 0-2 but cannot get the query to find the document.

I expected the document to be found as long as I gave some substring of Houston and appended the fuzzy search character ~. Why does Housto~ match Houston but not Houst~?

Update: My post request has the following body:

{
  "search": "CityName:Houst~",
  "filter": "CityName eq 'Houston'",
  "queryType": "full"
}

The filter is there because some other documents have since been added to the search with shorter variations oh 'Houston'. For example, a document with the CityName 'Houst' has been added and this document does get returned by the above search. It's good that the document does show up but what I need is for the document with the full 'Houston' to ALSO be returned, presumable with a lower score than 'Houst'.

One thing I noticed is that using "search": "CityName:Houst~" finds a bunch of documents with houston (lowercase H) but does not find any documents with Houston (uppercase H). I'm confused by this as the two are so similar, I'd expect them both to be found by the search.

Update 2: Doing some more research based on the conversation in Joey Cai's answer also lead me to this related stack overflow question. The problem described in the post of chn not matching to China is basically the same as the root issue that I was having, though Joey's answer is most relevant to my question. My conclusion is that the azure fuzzy search is not designed to match strings that are significant distances apart by design (supported by the answer given in the linked question).


Solution

  • I test in my site and could reproduce your problem.

    However, when I add "queryType": "full" not queryType:full, it works well.

    The request url is similar like:

    https://searchname.search.windows.net/indexes/datasourcename/docs?api-version=2019-05-06&search=Houst~&queryType=full