Search code examples
elasticsearchsharding

ElasticSearch: Unassigned Shards, how to fix?


I have an ES cluster with 4 nodes:

number_of_replicas: 1
search01 - master: false, data: false
search02 - master: true, data: true
search03 - master: false, data: true
search04 - master: false, data: true

I had to restart search03, and when it came back, it rejoined the cluster no problem, but left 7 unassigned shards laying about.

{
  "cluster_name" : "tweedle",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 15,
  "active_shards" : 23,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 7
}

Now my cluster is in yellow state. What is the best way to resolve this issue?

  • Delete (cancel) the shards?
  • Move the shards to another node?
  • Allocate the shards to the node?
  • Update 'number_of_replicas' to 2?
  • Something else entirely?

Interestingly, when a new index was added, that node started working on it and played nice with the rest of the cluster, it just left the unassigned shards laying about.

Follow on question: am I doing something wrong to cause this to happen in the first place? I don't have much confidence in a cluster that behaves this way when a node is restarted.

NOTE: If you're running a single node cluster for some reason, you might simply need to do the following:

curl -XPUT 'localhost:9200/_settings' -d '
{
    "index" : {
        "number_of_replicas" : 0
    }
}'

Solution

  • OK, I've solved this with some help from ES support. Issue the following command to the API on all nodes (or the nodes you believe to be the cause of the problem):

    curl -XPUT 'localhost:9200/<index>/_settings' \
        -d '{"index.routing.allocation.disable_allocation": false}'
    

    where <index> is the index you believe to be the culprit. If you have no idea, just run this on all nodes:

    curl -XPUT 'localhost:9200/_settings' \
        -d '{"index.routing.allocation.disable_allocation": false}'
    

    I also added this line to my yaml config and since then, any restarts of the server/service have been problem free. The shards re-allocated back immediately.

    FWIW, to answer an oft sought after question, set MAX_HEAP_SIZE to 30G unless your machine has less than 60G RAM, in which case set it to half the available memory.

    References