My creation/deletion of Solr collections doesn't work anymore. When I launch a creation (via a curl), I have the following answer (after 30sec):
Error 500 - Could not fully create collection: <collection_name>
EDIT: I had another time, the same issue: Solr couldn't fully reboot, or was hanging.
HDP: 2.6.2
Solr(Cloud): 5.5.5
ZK: 3.4.6
I have struggled so many days with that problem !
In fact, the overseer queue was too large in Zookeeper:
zkCli.sh -server zkhost:2181 ls /solr/overseer/queue
and
zkCli.sh -server zkhost:2181 ls /solr/overseer/queue-work
returned several 100k entries and kept growing !
Process to recover:
1. Stop Solr Nodes
2. Remove overseer queues and recreate them:
zkCli.sh -server zkhost:2181 rmr /solr/overseer/queue
zkCli.sh -server zkhost:2181 create /solr/overseer/queue
zkCli.sh -server zkhost:2181 rmr /solr/overseer/queue-work null
zkCli.sh -server zkhost:2181 create /solr/overseer/queue-work null
3. Start solr Nodes
We can see in the code: https://github.com/apache/lucene-solr/blob/dbed8bafe6ee167361599deaa4f1b5fdbb0b1c32/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L170 The Code try to create the nodes for the Solr collection, then during 30sec poll Zookeeper to check if it has created the nodes. If not it fails with "Could not fully create collection:"