Search code examples
windowselasticsearchkibanashardingserilog

Elasticsearch: How can I reduce the number of primary shards?


I'm experiencing some difficulty with Elasticsearch. Incidentally, I'm running Elasticsearch as a Windows service.

Of note:

  1. I cannot connect to my Elasticsearch cluster (1 node) via Cerebro.
  2. Elasticsearch requests are timing out. At first, Kibana noticed the timeouts, and then I began investigating further.
  3. When I restart the Elasticsearch service, it takes a long time to start. Of note, when I run _cat/indices, it takes a long time for the indexes to turn from yellow to red.

I ran _cluster/stats?human&pretty and I noticed the following:

  "indices": {
    "count": 159,
    "shards": {
      "total": 793,
      "primaries": 793,
      "replication": 0.0,
      "index": {
        "shards": {
          "min": 3,
          "max": 5,
          "avg": 4.987421383647798
        },
        "primaries": {
          "min": 3,
          "max": 5,
          "avg": 4.987421383647798
        },
        "replication": {
          "min": 0.0,
          "max": 0.0,
          "avg": 0.0
        }
      }
    },
    "docs": {
      "count": 664553,
      "deleted": 0
    },
    "store": {
      "size": "525.8mb",
      "size_in_bytes": 551382263
    },
    "fielddata": {
      "memory_size": "0b",
      "memory_size_in_bytes": 0,
      "evictions": 0
    },

My Question:

  • Is the 793 shards a red flag? Should I adjust this?

UPDATE: I believe I have allocated too many shards.

So my revised question is:

  • How can I remediate this situation where I have allocated too many shards?
    • Specifically, how many shards should I shrink to?
    • And which commands should I issue to reduce the number of shards?
      • Particularly in consideration of the fact that it's taking a very long time for my Elasticsearch cluster (i.e. 1 node) to restart.

Solution

  • Having 793 primary shards on just 1 node of elasticsearch is a big NO. Elasticsearch scale comes from its distributed nature. Also, I noticed you don't have any replica so it's not reliable as well, If some primary shards are corrupted, then they don't have any recovery mechanism in this case.

    Coming to your question, how many shards should I shrink to?
    It totally depends on your requirement if you have large data then(more than few 100 GB), you should split into multiple primary shards and they should live on multiple nodes to improve the performance(gained from small shard size and another hardware) and it provides horizontal scalability(HS).

    But again if your data is small ie total index size is few GB, then it having multiple shards also hurts performance, as your shards will have just a few data and having all the data stored in one shard will improve performance a lot in this case.

    Please refer this guide for more info on sharding strategy.

    You can reduce the no of shards using the link https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-shrink-index.html user The_Pingu provided, but that also depends on which version of ES you are using. and I would suggest, before going to that path you should consider below architectural aspects.

    1. How many nodes you should have in the ES cluster, one node ES cluster is definitely not recommended in production, as you loose high availability even if you have small data in production.
    2. How many shards you need.
    3. How many replicas you need. (although this can be changed easily runtime without much overhead).