Search code examples
elasticsearchcluster-computingproduction

Configure an Elasticsearch cluster with 3 Master nodes and 33 Data nodes on physical servers


I'm using Elasticsearch to deal with 10T, so I do all the work on how many shards, RAM, CPU and hard disk to use but as I try to configure these nodes , I'm very confusing with the number of feature to deal with and why we must use it , so if there is some guidelines or recommendations on how to do a standard configuration and best practice on this subject and if I need to configure other nodes


Solution

  • It heavily depends on your use case: is it indexing or search heavy, what is the document schema, what search queries are you going to run. For example, n-gram tokens might easily inflate resources needed 10x.

    There are few general rules though.

    • You want your shards to be between 20-50 GB
    • You want less than 20k shards in your cluster
    • You want shards to be distributed evenly across machines
    • You want ~30 GB heap
    • You want your heap to take ~50% of RAM
    • You want as much CPU as you can eat
    • You want local (not network-attached) SSDs

    Or, if you want the least hassle possible, you can go with Elastic Cloud which will take some of the hardware concerns away in exchange for a fee.