Search code examples
azure-cognitive-search

Why is it possible to get duplicate results from Azure Search when paging?


Sometimes when using Azure Search's paging there may be duplicate documents in the results. Here is an example of a paging request:

GET /indexes/myindex/docs?search=*$top=15&$skip=15&$orderby=rating desc

Why is this possible? How can it happen? Are there any consistency guarantees when paging?


Solution

  • The results of paginated queries are not guaranteed to be stable if the underlying index is changing, or if you are relying on sorting by relevance score. Paging simply changes the value of $skip for each page, but each query is independent and operates on the current view of the data (i.e. – there is no snapshotting or other consistency mechanism like you’d find in a general-purpose database).

    Here is an example of how you might get duplicates. Assume an index with four documents:

    1. { "id": "1", "rating": 5 }
    2. { "id": "2", "rating": 3 }
    3. { "id": "3", "rating": 2 }
    4. { "id": "4", "rating": 1 }

    Now assume you want to page through the results with a page size of two, ordered by rating. You’d execute this query to get the first page:

    $top=2&$skip=0&$orderby=rating desc
    

    And get these results:

    1. { "id": "1", "rating": 5 }
    2. { "id": "2", "rating": 3 }

    Now you insert a fifth document into the index:

    { "id": "5", "rating": 4 }
    

    Shortly thereafter, you execute a query to fetch the second page of results:

    $top=2&$skip=2&$orderby=rating desc
    

    And get these results:

    1. { "id": "2", "rating": 3 }
    2. { "id": "3", "rating": 2 }

    Notice that you’ve fetched document 2 twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page.

    In situations where you're relying on document score (either you don't use $orderby or you're using $orderby=search.score()), paging can return duplicate results because each query might be handled by a different replica, and that replica may have different term and document frequency statistics -- enough to change the relative ordering of documents at page boundaries.

    For these reasons, it’s important to think of Azure Search as a search engine (because it is), and not a general-purpose database.