Search code examples
solrdataimporthandler

How can I be notified in real-time when Solr is ready to accept another import request?


I'm writing a simple queue that sends update requests to a Solr core's DataImportHandler. This handler updates the Solr core by running a query against the database. When one update is complete, I'd like to immediately send the next update request. However, I'm having some trouble detecting when Solr is ready to accept another update request. Here's what I've tried:

  • The onImportEnd event: Using the onImportEnd event seemed like the most obvious way to go. I created a custom event listener that makes a network request back to my application to indicate that Solr is ready to accept another request. Unfortunately, it seems that this event is called after Solr is done importing but before its status is idle. If my application makes a second request immediately after the onImportEnd event, the request returns with a "busy" status.

  • The postCommit event: I created an .exe that runs on the postCommit event. This executable seems to run during the import process - Solr doesn't return to "idle" status until this executable is finished.

  • The postOptimize event: This event is never called.

  • Polling for status changes: This method would work, but it would mean a delay between each update request. I'd like these requests to be executed as fast as possible.

Is there another way to detect when Solr is ready to accept another update request?


Solution

  • What I have done in similar scenarios:

    1. add more DIH handlers, you can have as many as you want, each one pointing to it's own xml config file if necessary. Add 10 of these for example
    2. each time you have some piece of data that needs a DIH handler, iterate over all of them ill you find one free (add some sleep() for sanity if all are busy). This has worked me well for large volume indexing.

    Of course this needs you indexing operations to be parallelizable, if they aren't this setup won't work.

    By the way, postOptimize would be called when you call optimize, not commit.