Search code examples
solrsolrcloud

Solr Cloud - Solr Hanging / can't start OR Could not fully create collection: <collection_name>


My creation/deletion of Solr collections doesn't work anymore. When I launch a creation (via a curl), I have the following answer (after 30sec):

Error 500 - Could not fully create collection: <collection_name>

EDIT: I had another time, the same issue: Solr couldn't fully reboot, or was hanging.

HDP: 2.6.2
Solr(Cloud): 5.5.5
ZK: 3.4.6


Solution

  • I have struggled so many days with that problem !

    In fact, the overseer queue was too large in Zookeeper:

    zkCli.sh -server zkhost:2181 ls /solr/overseer/queue and zkCli.sh -server zkhost:2181 ls /solr/overseer/queue-work returned several 100k entries and kept growing !

    Process to recover:
    1. Stop Solr Nodes
    2. Remove overseer queues and recreate them:
    zkCli.sh -server zkhost:2181 rmr /solr/overseer/queue
    zkCli.sh -server zkhost:2181 create /solr/overseer/queue
    zkCli.sh -server zkhost:2181 rmr /solr/overseer/queue-work null
    zkCli.sh -server zkhost:2181 create /solr/overseer/queue-work null
    3. Start solr Nodes

    We can see in the code: https://github.com/apache/lucene-solr/blob/dbed8bafe6ee167361599deaa4f1b5fdbb0b1c32/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L170 The Code try to create the nodes for the Solr collection, then during 30sec poll Zookeeper to check if it has created the nodes. If not it fails with "Could not fully create collection:"