Search code examples
solrload-balancingamazon-elb

Solr healthcheck for >0 documents


The default configuration for solr of /admin/ping provided for load balancer health check integrates well with the Amazon ELB load balancer health checks.

However since we're using master-slave replication when we provision a new node, Solr starts up, and replication happens, but in the meantime /admin/ping return success before the index has replicated across from master and there are documents.

We'd like nodes to only be brought live once they have done the first replication and have documents. I don't see any way of doing this with /admin/ping PingRequestHandler - it always return success if the search succeeds, even with zero results.

Nor is there anyway of matching/not matching expected text in the response with the ELB health check configuration.

How can we achieve this?


Solution

  • To expand on the nature of the problem here, the PingRequestHandler will always return a success unless....

    1. Its query results in an exception being thrown.
    2. It is configured to use a healthcheck file, and that file is not found.

    Thus my suggestion is that you configure the PingRequestHandler handler to use a healthcheck file. You can then use a cron job on your Solr system whose job is to check for the existence of documents and create (or remove) the healthcheck file accordingly. If the healthcheck file is not present, the PingRequestHandler will throw a HTTP 503 which should be sufficient for ELB.

    The rough algorithm that I'd use...

    • Every minute, query http://localhost:8983/solr/select?q=*:*
    • If numDocs > 0 then touch /path/to/solr-enabled
    • Else rm /path/to/solr-enabled (optional, depending on your strictness)

    The healthcheck file can be configured in the <admin> block, and you can use an absolute path, or a filename relative to the directory from which you have started Solr.

    <admin>
      <defaultQuery>solr</defaultQuery>
      <pingQuery>q=*:*</pingQuery>
      <healthcheck type="file">/path/to/solr-enabled</healthcheck>
    </admin>
    

    Let me know how that works out! I'm tempted to implement something similar for read slaves at Websolr.