Search code examples
elasticsearchreplicationsharding

ElasticSearc Shard Replica use in search


I am new to elastic search and trying to understand its search process using shard replicas, from the documents of elastic search i found that shard replicas are used to handle fault tolerance if primary shard gets down, any replica shard can be made primary shard.

But documents also mention that it helps in load balancing when there is heavy search traffic and

also there is possibility to have more than one replica for each shard in that case how replica shard is selected to serve the search?

Is it like if i have one primary shard and 3 replicas on other nodes, so total 4 copy, then when multiple search requests come to elastic search will each search by diverted to only one of those 4 copies ?

I am also looking for some graphical representation for replica shard usage for better understanding.


Solution

  • Old documentation, but still relevant for shard allocation and graphics: https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html https://www.elastic.co/guide/en/elasticsearch/guide/current/_how_primary_and_replica_shards_interact.html

    Essentially, it is a duplication of data, to make reads much faster and also protect against data loss. As a trade-off, writes are slower, because your cluster needs to write to the Primary shard, then transfer the data via the network to the Replica shards.

    The reason why reads are faster is because, like you mentioned, load is balanced across the nodes. If one node responds much faster than the others, then the result is returned that much quicker. https://www.elastic.co/guide/en/elasticsearch/guide/current/distrib-read.html

    Because writes are slower, it is wise to turn off replica shards for indices that you are doing sizable bulk writes to, then turn replicas back on afterwards.

    Naturally, you might not want data duplication across all of your nodes if you are sending frequent updates. Consider tuning your replicas and running performance testing to get your ideal balance between routine read/write performance.