Search code examples
azure-cognitive-search

In Azure Search, is there a way to scope facet counts to the top parameter?


If I have an index with 10,000,000 documents and search text and ask to retrieve the top 1,000 items, is there a way to scope the facets to those 1,000 items?

My current problem is: We have a very large index with a few different facets, including manufacturer. If I search for a product (WD-40 for instance), that matches a lot of different document and document fields. It finds the product and it is the top scoring match, but because they only make 1 or 2 products, the manufacturer doesn't show up as a top facet option because it is sorted by count.

Is there a way to scope the facets to the top X documents? Or, is there a way to only grab documents which are above a certain @search.score?


Solution

  • The purpose of a refiner is to give users options to narrow down the result set. I would say the $top parameter and returned facets works as it should. Trying to limit the number of refiners to be based on the top 1000 results is a bad idea when we think about it. You'll end up with confusing usability- and recall issues.

    Your query for WD-40 returns a large result set. So large that there are 155347 unique manufacturers listed. I'm guessing you have several million hits. The intent of that query is to return the products called WD-40 (my assumption). But, since you search all properties in all types of content, you end up with various products like doors, hinges, and bikes that might have some text saying that "put some WD-40 on it to stop squeaks".

    I'm guessing that most of the hits you get are irrelevant. Thus, you should either limit the scope of your initial query by default. For example, limit to searching only the title property. Or add a filter to exclude a category of documents (like manuals, price lists, etc.).

    You could also consider submitting different queries from your frontend application. One narrowly scoped query that retrieves the refiners and another, broader query that returns the results.