Search code examples
elasticsearchreindex

_reindex has suddenly stopped working inexplicably


I've been using the _reindex endpoint for some time now, about 2 months, without any problems.

I'm running ES 8.6.2 on port 9500. The OS is W10.

For some reason, as of this morning, any attempts to reindex are just not ending. The indices in question are about 20 MB or so: I've let the operation run on for as much as 3 minutes. I've tried running the command from the program (Python) but also from Insomnia.

I've also checked that the index I'm trying to reindex definitely does exist. I've also restarted the ES server. Other ES server commands appear to be working fine.

I've tried rebooting.

I notice that when I cancel the operation after many seconds, and then go GET _cat/indices I now see the new index which I was attempting to create with status "red" and apparently no documents (or stated size).

Any suggestions about what might be causing this?

Later

Output from _cluster/allocation/explain, suggested by Val:

{
    "note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
    "index": "temp_xxx3",
    "shard": 0,
    "primary": true,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "INDEX_CREATED",
        "at": "2024-02-02T08:41:27.584Z",
        "last_allocation_status": "no"
    },
    "can_allocate": "no",
    "allocate_explanation": "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
    "node_allocation_decisions": [
        {
            "node_id": "GcqJ5quJRwSfXBVDba4cdw",
            "node_name": "node862-1",
            "transport_address": "127.0.0.1:9301",
            "node_attributes": {
                "ml.allocated_processors": "8",
                "ml.machine_memory": "17031471104",
                "xpack.installed": "true",
                "ml.allocated_processors_double": "8.0",
                "ml.max_jvm_size": "2147483648"
            },
            "node_decision": "no",
            "weight_ranking": 1,
            "deciders": [
                {
                    "decider": "disk_threshold",
                    "decision": "NO",
                    "explanation": "the node is above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%], having less than the minimum required [27gb] free space, actual free: [16.5gb], actual used: [93.8%]"
                }
            ]
        }
    ]
}

... now searching on this "watermark" thing...


Solution

  • An index that gets created red usually means that the shards cannot be assigned to it. There can be many reasons.

    One of them being that indexing can't happen because the disks are filled up, you can check your disk watermarks to make sure.

    There may also exist some allocation rules that prevent the shards from being allocated to any node.

    In any case, if you run GET _cluster/allocation/explain, you'll see immediately why the shards of that index don't get allocated.

    If it turns out that you crossed the watermark, you have a few options:

    1. You can increase the watermark setting, but you're only kicking the can down the road.
    2. You can increase the available disk space or add a new node.
    3. As last resort you can delete some data.