why elasticsearch get from+size results from each shard when pagination

for example, I've two shards A and B, If I want to get results from 20, and the size is 10, then elasticsearch will first get 30(20+10) results from shard A and get 30(20+10) results from shard B, and then get the final 10 results from 60(30+30), I can't understand as In my opinion, you can get the top 10 results from each shard, and then get the final 10 results from 20(10+10), that means you can get size result from each shard, not from+size, as the final results must be in the top size results from each shard, why does es do this?

Solution

In your scenario, when you request results starting from 20 with a size of 10, Elasticsearch doesn't fetch 20 results from each shard and then merge them. Instead, it retrieves 30 results from each shard (ranging from 20 to 50) and then combines and sorts these results to provide you with the top 10 from the total pool of 60.

This approach is taken to ensure result accuracy. If each shard independently produced its top 10 results, merging them later might lead to missing globally relevant results. By fetching a larger set from each shard and then sorting and merging, Elasticsearch can accurately prioritize and deliver the top N results globally, rather than just the top N from each shard independently.

This method maintains consistency and precision in search results, especially in situations where data distribution across shards is uneven or when queries involve complex sorting and filtering. Although it might seem counterintuitive, this method is essential for upholding the quality of search outcomes in a distributed system like Elasticsearch.