Azure Search paging causes throttling

I am from the team that runs nuget.org, the package ecosystem for .NET. We use Azure Search to power our search API. Our APIs are public, so third-party customers can use them to analyze our ecosystem or make apps.

We recently had an outage caused by a single customer paging through our search documents using the $skip and $top query parameters in batches of 200 documents at a time. This resulted in Azure Search throttling:

Failed to execute request because the request rate has caused your service to exceed the limits of its provisioned capacity. Reduce the rate of requests, or adjust the number of replicas/partitions. See http://aka.ms/azure-search-throttling for more information.

Azure Search's throttling affected all customers in that region for 10 minutes, not just the single customer that was paging. We read through Azure Search's throttling documentation, but have the following questions:

Is customer paging with high $skip values particularly expensive for Azure Search?
What can we do to reduce the likelihood of Azure Search throttling for paging scenarios?
Should we add our own throttling to ensure a single customer’s searches doesn’t affect all other customers' searches? Does Azure Search have guidance on this?

Some more information about our service:

Number of documents in index: ~950K
Request volume: 1.3K paging requests in ~10 minutes. Peak of 125 requests per second, average of 6 requests per second
Scale: standard SKU, 1 partition, 3 replicas (this is our secondary region, hence the smaller scale to save money)

Solution

Deep paging is indeed a costly operation. Since Azure Search is designed to be distributed, all indexes are divided into multiple shards, to allow for quick scale operation. This comes with the downside that ranked results from each shard need to be merged and ranked to create a final list of results. The number of results to merge increases linearly with the skip value, so that step can become expensive when paging very deep in the results.

As a search service, Azure Search is optimized for quick retrieval of top documents based on textual relevance. It's unfortunately not the best tool for scenarios where a client simply want to return a list of all documents in a data source.

From what I understand in your post, there's 2 reasons for the throttling

High skip values
Sharp increase in QPS

We encourage you to control both. It is not uncommon for our customers to implement their own throttling logic to prevent their own customers from emitting an abnormally large amount of requests. Even without skip values, having a single customer send enough queries to increase the traffic multiple-fold can lead to throttling (I'm not sure if that was the case here). There's no official guidelines as to how to handle queries coming from your client apps. The best approach in my opinion would probably be for your team to run performance tests using realistic workloads to try to understand the limits of your search service (which depends on the index schema, number of documents, type of queries being emitted, etc.). Once you have a good idea of how many QPS your service can handle for your scenarios, then you can decide how much of that QPS you are willing to allocate to a single customer at a time, and enforce a limit based on that.

Regarding the deep paging cost: if this is a common scenario for your customers (paging through all documents of a search index), I would recommend you expose a way to page through all documents directly from the data source (assuming Azure Search is not the primary data store of the documents), and mostly use Azure Search for relevance related retrieval scenarios only.