amazon-web-services elasticsearch sharding diskspace amazon-elasticsearch

AWS Elasticsearch cluster disk space not balanced across data instances

Background

I have an AWS managed Elascsearch v6.0 cluster that has 14 data instances.

It has time based indices like data-2010-01, ..., data-2020-01.

Problem

Free storage space is very unbalanced across instances, which I can see in the AWS console:

I have noticed this distribution changes every time the AWS services runs through a blue-green deploy. This happens when cluster settings are changed or AWS releases an update.

Sometimes the blue-green results in one of the instances completely running out of space. When this happens the AWS service starts another blue-green and this resolves the issue without customer impact. (It does have impact on my heart rate though!)

Shard Size

Shards size for our indices are gigabytes in size but below the Elasticsearch recommendation of 50GB. The shard size does vary by index, though. Lots of our older indices have only a handful of documents.

Question

The way the AWS balancing algorithm does not balance well, and that it results in a different result each time is unexpected.

My question is how does the algorithm choose which shards to allocate to which instance and can I resolve this imbalance myself?

Solution

I asked this question of AWS support who were able to give me a good answer so I thought I'd share the summary here for others.

In short:

AWS Elasticsearch distributes shards based on shard count rather than shard size so keep your shard sizes balanced if you can.
If you have your cluster configured to be spread across 3 availability zones, make your data instance count a divisible by 3.

My Case

Each of my 14 instances gets ~100 shards instead of ~100 GB each.

Remember that I have a lot of relatively empty indices. This translates to a mixture of small and large shards which causes the imbalance when AWS Elasticsearch (inadvertently) allocates lots of large shards to an instance.

This is further worsened by the fact that I have my cluster set to be distributed across 3 availability zones and my data instance count (14) is not divisible by 3.

Increasing my data instance count to 15 (or decreasing to 12) solved the problem.

From the AWS Elasticsearch docs on Multi-AZ:

To avoid these kinds of situations, which can strain individual nodes and hurt performance, we recommend that you choose an instance count that is a multiple of three if you plan to have two or more replicas per index.

Further Improvement

On top of the availability zone issue, I suggest keeping index sizes balanced to make it easier for the AWS algorithm.

In my case I can merge older indexes, e.g. data-2019-01 ... data-2019-12 -> data-2019.