Search code examples
elasticsearchdistributed-computing

Why no of primary shards can't be changed once index is created in Elastic search?


In Elasticsearch (ES), why can't we changed the no of primary shards once index is created, while we can change the no of replica shards anytime. I understand that changing replica shards is less overhead as compare to primary shards, but it doesn't seem to be a impossible task ?

Is there is any distributed system concept or performance gain for not allowing it ?


Solution

  • I was about to try to explain it myself, but I found out it's already properly explained in the official Elasticsearch official guide:

    Users often ask why Elasticsearch doesn’t support shard-splitting—the ability to split each shard into two or more pieces. The reason is that shard-splitting is a bad idea:

    • Splitting a shard is almost equivalent to reindexing your data. It’s a much heavier process than just copying a shard from one node to another.

    • Splitting is exponential. You start with one shard, then split into two, and then four, eight, sixteen, and so on. Splitting doesn’t allow you to increase capacity by just 50%.

    • Shard splitting requires you to have enough capacity to hold a second copy of your index. Usually, by the time you realize that you need to scale out, you don’t have enough free space left to perform the split.

    In a way, Elasticsearch does support shard splitting. You can always reindex your data to a new index with the appropriate number of shards (see Reindexing Your Data). It is still a more intensive process than moving shards around, and still requires enough free space to complete, but at least you can control the number of shards in the new index.