Search code examples
c#elasticsearchnest

Setting up Elasticsearch with right settings and configurations in windows servers for production use


I am new to Elasticsearch. I have implemented Elasticsearch in dev (c# -NEST library) with default ES settings. My question is regarding migrating to production with right configuration. Some Facts:

  • The solution is implemented for searching catalog items.
  • Total number of items currently is 5K+ but will reach 25K+ in near future.
  • Current ES index size is 5MB so i think for 25K items it will grow to 25MB.
  • Number of searches per hour is not very high.
  • In PROD there are 2 servers (virtual, Windows 2008 R2 Standard, 4 CPU,16GB RAM, 100GB space).
  • Both servers under load balancer.

Questions:

  1. I would like to know how many shards and replicas i need to configure. for decent performance and high reliability and availability.
  2. What is the recommended node configuration on each server (in terms master, client, data node etc)
  3. What is the recommended way of deploying with configuration in windows with less manual steps.
  4. Please share your good/bad experiences (also tips and lesson learnt) in deploying and maintaining in windows environment.
  5. Actually, i don't know what i don't know about moving to production. I may have missed some trivial settings.

Note: I have gone through the guides of how to configure the system and Elasticsearch settings. https://www.elastic.co/guide/en/elasticsearch/reference/current/system-config.html.

But i don't know the recommended values to configure. Thanks in advance.


Solution

  • I'll try to give you some general answers, to get you started:

    1. I would like to know how many shards and replicas i need to configure. for decent performance and high reliability and availability.

    What does decent performance mean? A single primary shard should be more than sufficient for 25,000 items and 25Mb. The entire index can effectively live in RAM at this size!

    You can add a replica such that both nodes can service search requests.

    NOTE: two master-eligible nodes is not a good number for production purposes because in a master election process, there is an insufficient number of votes to have a quorum and select a new master, leading to split brain scenarios. For high availability, you would want a minimum of three master-eligible nodes, ideally also in separate availability zones within a region, so that the node locations are isolated.

    1. What is the recommended node configuration on each server (in terms master, client, data node etc)

    For the amount of data that you're dealing with, three master-eligible nodes would be sufficient, and would satisfy high availability. Not all nodes necessarily need to be data nodes, but assuming one primary shard and replica, at least two nodes would need to be data nodes. The third node can be a master only node, and effectively acts as an arbiter for master election.

    1. What is the recommended way of deploying with configuration in windows with less manual steps.

    This is an extremely open ended question, fraught with opinionated answers! Some examples of what you might use for a Windows environment could be

    • PowerShell DSC
    • Puppet
    • Ansible
    • Terraform
    • Cloud specific deployment solutions e.g. CloudFormation, Azure Resource Manager
    1. Please share your good/bad experiences (also tips and lesson learnt) in deploying and maintaining in windows environment.
    • Start with Elasticsearch's default configuration and have a read about Configuring Elasticsearch. Don't change settings from their defaults unless you really know what you're doing!
    • With 25Mb of data, the need to snapshot may not be important, as rebuilding an index with 25,000 items is not going to take long.
    • Assuming this is a search use case, use Index Aliases with versioned indices. Your application will use the alias, allowing you to iterate on an indexing and search strategy that satisfies your information retrieval need.