Search code examples
azure-cognitive-search

cognitive search faceting storage requirement


Cognitive search documentation suggests that in order to reduce index size, we need to set facetable as false on fields that we won't be faceting on. We are working on a generic application where the fields are created dynamically, and all of them could/should be facetable. For example, we have a generic field called genericField1 in the index, and service A could be storing price in that field and service B could be storing an id. This design forces all the fields to be strings(which is ok according to our architecture), and all of them to be facetable. I am trying to understand the size implications for such a solution, where fields like id are facetable.


Solution

  • Faceting in Azure Cognitive Search requires a separate data structure apart from the inverted index that supports searching. This data structure is stored on the disk and allows aggregation based on values. It's optimized for accessing field values quickly over efficient storage.

    The size of the data structure increases with the number of facetable fields and cardinality in their values. ACS recommends that you do preliminary testing on your service setup to get concrete numbers on storage utilization and choose a topology that will serve your use-case.

    Note: Another feature that can result in high storage utilization is complex collections. Make sure you measure the impact if you plan to use both faceting and complex collections.