Search code examples
elasticsearchreplicationsharding

Should I use sharding/replication on a single machine in elasticsearch?


I have a large dataset in an index in elasticsearch. I have only one physical machine and that is not about to change in near future.

Is there any point in using sharding and/or replication if I can't have more nodes to run elasticsearch on? Will it still improve performance, or should I stick to having just one shard?


Solution

  • In a single machine. replication doesn't make sense as its mainly used for high availability(if machine holding another copy goes down) you can still serve requests from machine where replica is hosted, and to provide better search performance, as you search can happen from any replica but in a single machine both these use-cases are not valid, hence even if you try, ES will not allocate replica of same shard on the same node.

    Coming to multiple primary shards, its more complicated as it depends on various factor, if you have good disk and RAM available, and have huge amount of data than having a single primary shard means large segment size and segment size more than 5 GB is big and not eligible for segment merging and difficult to cache, on the other hand too many small segments also badly impact the search performance. you should know that ES creates one thread per shard and having more shards of a single index, means more threads from same machine is involved while searching the data. So best is that based on your data, infra you do some benchmarking and choose what is best for your use-case.