how does replica shards help improve search performance

I've read most documentation about shards and replicas in Elasticsearch and most of them mention that replica shards "can" help improve search performance. But they don't go into detail on how they could achieve that. What I want to know is how exactly would it improve search performance?

I'm familiar with primary and replica shards in Elasticsearch, what it contains and how replica can provide high availability. I also understand scatter-gather approach when searching across multiple primary shards.

To illustrate my problem, suppose I have a read-only index with 1 primary and 4 replicas (5 nodes cluster). If I perform a complex search query against that index, will

all 5 shards process the search? Or,
is it processed by only one/some of them?

I will perform this heavy search on this index many times (millions).

If it is 1) I don't think this will necessarily improve performance, on the contrary it would just saturate the other nodes with the tasks and slowing down the whole cluster. If it is, 2) how does it determine which node to forward the request to? I understand that the node that receive the request (coordinating node) would forward the request to all nodes that hold the primary shard(s), but how does it work with replicas?

Solution

If I perform a complex search query against that index, will all 5 shards process the search or is it processed by only one/some of them??

Only one shard will be used for your query. However, if at the same time you will start another query, that query will most likely end up on another node, the third query on the third node and so one. So, adding 4 more nodes will not improve latency of a single query, it will however improve throughput of the cluster if you run some of these queries simultaneously.

If instead of creating 4 replicas for a single primary, you created 5 primaries, your query would have been sent to all 5 shards and executed in parallel on each shard. However, in this scenario, sending and merging results from 5 shards adds additional time to the processing. So, you improve latency by working on search in parallel but you add overhead of merging results. So, depending on your data and query it might improve or worsen your overall latency. It will worsen cluster throughput for simulations queries though since you are doing more work comparing to a single shard.

how does it determine which node to forward the request to?

It depends on the version of elasticsearch and some settings. The modern versions of elasticsearch are using Adaptive replica selection by default. The older versions were using round robin, which is still available when you disable adaptive replica selection.