The reason I ask - we're using Algolia that is not able to return correct numbers for facets when we have 1-3mln+ of data items (~50GB) we search over. And it was confirmed by Algolia techs -> they're optimizing on retrieval timings, so prefer to return less data with incorrect counts but fast. And they're focused on FTS mainly.
Just want to confirm what is the approach for Azure search - could we rely on that? Or should we create faceting ourselves?
Mainly speaking the case is simple - eCommerce app (Internet-shop) with huge number of items (SKUs) available for selling and we'd like to provide ability to search via Facets filtering.
Azure Search does not guarantee accurate facet counts unless you request a count greater than or equal to the number of unique values in the field being faceted. For example, if you have a category
field with 10 unique values, this may return inaccurate counts:
GET /indexes/myindex/docs?facet=category,count:3&api-version=2016-09-01
While this will return accurate counts:
GET /indexes/myindex/docs?facet=category,count:10&api-version=2016-09-01
However, for fields with many unique values, using a large value for count
can have negative performance implications.
This is from the official docs on Azure Search facets:
Note that if the count parameter is less than the number of unique terms, the results may not be accurate. This is due to the way faceting queries are distributed across shards. Increasing count generally increases the accuracy of term counts, but at a performance cost.
There is also a discussion on the MSDN forums about facet count accuracy that you might find interesting.