elasticsearch cluster-computing sharding

Elasticsearch 7.2 cluster meeting unassigned shards

I want to build a three nodes Elasticsearch cluster using 7.2 version，but something is unexpected.

I have three virtual machine: 192.168.7.2、192.168.7.3、192.168.7.4, their main config in config/elasticsearch.yml:

192.168.7.2:

cluster.name: ucas
node.name: node-2
network.host: 192.168.7.2
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
http.cors.enabled: true
http.cors.allow-origin: "*"

192.168.7.3:

cluster.name: ucas
node.name: node-3
network.host: 192.168.7.3
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]

192.168.7.4:

cluster.name: ucas
node.name: node-4
network.host: 192.168.7.4
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]

When I start each node,create a index named movie with 3 shards and 0 replicas, then write some docs to the index,the cluster looks normal:

PUT moive
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  }
}


PUT moive/_doc/3
{
  "title":"title 3"
}

then, set movie replica to 1:

PUT moive/_settings
{
  "number_of_replicas": 1
}

Everything goes on well,but when I set movie replica to 2:

PUT moive/_settings
{
  "number_of_replicas": 2
}

New replicas cannot be assigned to node2.

I do not know which step is not correct, please help and talk about it.

Solution

First find the reason why the shard can not be assigned using explain command：


GET _cluster/allocation/explain?pretty



{
  "index" : "moive",
  "shard" : 2,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2019-07-19T06:47:29.704Z",
    "details" : "node_left [tIm8GrisRya8jl_n9lc3MQ]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "kQ0Noq8LSpyEcVDF1POfJw",
      "node_name" : "node-3",
      "transport_address" : "192.168.7.3:9300",
      "node_attributes" : {
        "ml.machine_memory" : "5033172992",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "matching_sync_id" : true
      },
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[kQ0Noq8LSpyEcVDF1POfJw], [R], s[STARTED], a[id=Ul73SPyaTSyGah7Yl3k2zA]]"
        }
      ]
    },
    {
      "node_id" : "mNpqD9WPRrKsyntk2GKHMQ",
      "node_name" : "node-4",
      "transport_address" : "192.168.7.4:9300",
      "node_attributes" : {
        "ml.machine_memory" : "5033172992",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "matching_sync_id" : true
      },
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[mNpqD9WPRrKsyntk2GKHMQ], [P], s[STARTED], a[id=yQo1HUqoSdecD-SZyYMYfg]]"
        }
      ]
    },
    {
      "node_id" : "tIm8GrisRya8jl_n9lc3MQ",
      "node_name" : "node-2",
      "transport_address" : "192.168.7.2:9300",
      "node_attributes" : {
        "ml.machine_memory" : "5033172992",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [2.2790256709451573E-4%]"
        }
      ]
    }
  ]
}

We can see that the disk space of node-2 is full:

[vagrant@node2 ~]$ df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  8.4G  8.0G  480M  95% /
devtmpfs                 2.4G     0  2.4G   0% /dev
tmpfs                    2.4G     0  2.4G   0% /dev/shm
tmpfs                    2.4G  8.4M  2.4G   1% /run
tmpfs                    2.4G     0  2.4G   0% /sys/fs/cgroup
/dev/sda1                497M  118M  379M  24% /boot
none                     234G  149G   86G  64% /vagrant

Then I clean up the disk space and everything went back to normal: