Search code examples
azure-cognitive-search

Azure Search: Order by dynamic data


I have an Azure Search index composed of documents that can "occur" in multiple regions any number of times. For example Document1 has 5 occurrences in Region1, 20 occurrences in Region2. Document2 has 54 occurrences in Region1, and 10 occurrences in Region3. Document3 has 10 occurrences in Region3. We want to use Azure Search for searching and suggestions, but base the order on number of occurrences on a region. For example the search for Document from a user in Region1 should return in the order Document2, Document1, Document3 because Document2 has 54 occurrences in that region, while Document1 has 5 occurrences and Document3 has none.

[
  { 'name': 'Document1', 'regions': ['Region1|5', 'Region2|20'] },
  { 'name': 'Document2', 'regions': ['Region1|54', 'Region3|10'] },
  { 'name': 'Document3', 'regions': ['Region3|10'] }
]

I'm having a hard time figuring out how to structure the index or if it is even possible with Azure Search. Please note that the number of regions is potentially in the hundreds of thousands. I am ok with changing regions for center points and use geospatial functions instead, but I still don't see how to lay the data or query it.

What is the best way to structure the index and how would one make the query possible?


Solution

  • tl;dr - There might be a solution for you based on some assumptions I have. Please read on, and if possible try to provide some validations around my assumptions for me to give a better answer (if such an answer exists).

    Unfortunately, Azure search doesn't have an out-of-the box approach for your scenario. There might be a work around however - instead of the regions collection being something like ['Region1|5', 'Region2|20'], you could try to structure the document such that it appears to be ['Region1', 'Region1',...., 'Region2', 'Region2', ...] (that is, make the collection contain n elements of Region1 and m elements of Region2 where in your case n = 5 and m = 10.

    Then you should simply be able to search using the Region that the user originates from and I believe the results should be ordered based on which document's collection column (regions) contains more occurrences of the particular queried region.

    This approach helps you in 2 ways:

    1. You could try adding each region as a column in the search index and use some queries to get the kind of result you want. However, since you mention there might be hundreds of thousands of such regions, it might not work well with our service limits. If however that's not the case, I highly recommend adding each region as a column, so that you can query/order by the column value.
    2. With the replication of the string approach, you can have arbitrarily large collections as I believe Azure search does not have any limitations with regard to the number of elements in a collection. Also the nice thing here is, if your document will have a sparse number of regions (i.e., you may have 100s of 1000s of regions, but any given document will only have few regions enumerated), you should be able to achieve what you want. If that's not the case however, this approach might not be super nice/efficient and might even be painful for you to manage.

    Also, just FYI I'd recommend taking a look at the scoring profiles feature and especially the tag function to see if that might in any way be useful to you.