I'm playing with Elastic 6.1.1 and testing the limit of the software. If I take an index of ~ 300GB with 0 replicas and 10 data nodes, and then decide to add a replica, all Elastic instances are massively using network (but not CPU). This is a normal behaviour :)
But it appears network usage is somewhat "capped" - considering network graphs - to 160Mbps (20MiB/sec). This limit is strange as it was default throttle limit on previous versions of Elastic (indices.store.throttle.max_bytes_per_sec
), but this variable was deleted starting with Elastic 2.X
I wonder what is this cap, and how I could remove it.
I tried raising index.merge.scheduler.max_thread_count
with no effect ...
Do you see any other tuning that can be done in that end ?
Any feedback welcome !
You have this - https://www.elastic.co/guide/en/elasticsearch/reference/6.1/recovery.html - which limits the transfer rate of anything related to copying shard from node to node. You can start playing with it by increasing it gradually and see what the impact is on the cluster performance.
Also, you have https://www.elastic.co/guide/en/elasticsearch/reference/6.1/shards-allocation.html#_shard_allocation_settings which also affects the traffic between nodes when copying shards.