Search code examples
lucenejackrabbitcrx

Debugging Jackrabbit Lucene re-index abort/failure


I'm trying to rebuild the Lucene search index on a Jackrabbit 2.0 instance (actually a Day CRX 2.1 instance) so that I can apply new property boost weights for relevancy scoring. However it's repeatably aborting the indexing at the same point, count 3173000

*INFO * MultiIndex: indexing... /content/xxxxxx/jcr:content (3173000) (MultiIndex.java, line 1209)
*INFO * RepositoryImpl: Shutting down repository... (RepositoryImpl.java, line 1139)

(company names redacted) leaving the CRX web instance showing

java.lang.IllegalStateException: The repository is not available.

There's no indication in the logs why it's shutting down. There are no more lines between those two on any higher level of trace. The path mentioned exists and is unremarkable. Jackrabbit logs the path every 100 nodes so it could be any of the next 100 that cause the failure.

Any idea what could possibly have gone wrong, or how I can debug this?

(This, unfortunately, is one of those I'm-out-of-my-depth questions - I can't tell you much more because I don't know where to look.)


Solution

  • Thanks for everyone's suggestions in the comments. The problem was we had some content with bad HTML: specifically an <li>, closed or not, inside a <select><option>:

    <html><body><form>
      <select>
        <option value="1"><li></option>
      </select>
    </form></body></html>
    

    This kills javax.swing.text.html.parser.Parser with a StackOverflowError, which is a Throwable and so not caught by the error handling in Jackrabbit MultiIndex.

    I've reported the Parser crash to Oracle and I'll propose a patch to Jackrabbit core that adds extra try/catches around the indexing code to at least log the exact node with a problem and, where possible, recover from the error and carry on indexing. In the case of a StackOverflowError I think this is recoverable: by the time we're back in the exception handling code the stack has been unwound to a sensible depth.

    In practice I'm not going to be allowed to run a modified Jackrabbit in production here but at least I've identified and fixed the bad content so the same problem won't bite us there.