Search code examples
elasticsearchindexinglucenekibana

Elasticsearch alias has more than one write index (not a duplicate of any other question)


I have an elasticsearch cluster setup, on k8s with one statefulset for elasticseach master nodes(3 of them) and another statefulset for elasticsearch data nodes (15 of them).

During shard re-allocation due to a few of the data nodes reaching their capacity, we have now encountered an error on the data node(on a few of them actually) that goes like this:

uncaught exception in thread [main] org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: alias [alias-id_100536] has more than one write index [index-abc ,index-def]

we have encountered this issue earlier as well and the solution that worked for us was to find the hash of the index via the _cat/indices/index-abc api and bash onto the vm of the data node and delete the entire directory with the hash value, the index would again reach the desired replication count once the data node was up, so we had no data loss. However, now when we try to use the _cat/indices api on the index that has the write state, we see that the master node says that there are no indices with that name. Out of the 2 indexes that are thrown in the error, we are able to find the index on the master nodes for only one of them. We can try to delete them one by one on the data node after find the hash value from the _cat/indices api, however, i wanted to know what would be the ideal recovery method for this?

Elasticsearch version: 6.7.2

Thanks


Solution

  • I was able to solve this problem.

    We ended up writing a script that lists the directories of the elasticsearch data nodes indices path.

    That way we were able to get a list of all uuid's present on that data node. We next ran the _cat/indices?format=json api call on the master node. We collected the uuid's from this api and did a set() - set(<_cat/indices-uuid>).

    This gave us a list of all indices that were considered dangling. And we manually deleted the directories with the rm -rf command. This brought up the nodes.