Search code examples
dockerelasticsearchdocker-swarmshardingdocker-swarm-mode

Question on Elasticsearch shards on Docker Swarm


I am planning to configure a 3-node docker swarm with elastic stack deployed on it. Once configured and the shards being assigned, I have 2 part questions:

  1. If we configured the container storage to write to persistent storage (local) - In the event, one of the containers out of the 3 nodes dies, will the shards get balanced?

  2. If we spin up a new container as the new 3rd node in place of the one that died, will it read back from the disk like the older one, including the existing data and shards on the disk? Will the shards get re-balanced again?

Thanks in advance


Solution

  • Background

    Elasticsearch is a distributed system and primary shards are used for scaling it to multiple data nodes and replicas shards provide better availability and read performance. And based on your index and cluster settings ie how many shards and replicas you have in an index and how many data nodes are available, these shards and replicas are allocated.

    Coming to your questions

    1. If we configured the container storage to write to persistent storage (local) - In the event, one of the containers out of the 3 nodes dies, will the shards get balanced?

    Yes, if settings allow doing that, but remember shards and its replicas are never assigned on same data node, so in those cases, elastic search cluster status will be Yellow(missing replica shard) or RED(missing primary shard). , please read split-brain problem in Elasticsearch to understand it in detail.

    2. If we spin up a new container as the new 3rd node in place of the one that died, will it read back from the disk-like the older one, including the existing data and shards on the disk? Will the shards get re-balanced again?

    Yes, of course as you are not storing data on a docker container, Elasticsearch will read the data and using f-sync(which would be very fast) reallocate the shards on the new data node.