Search code examples
hibernatelucenehibernate-search

Hibernate Automatic indexing alternative


We have an Oracle database with about 130 tables. Out of that, only two tables are used for fulltext search (Which are cms tables). These tables are properly configured using Hibernate/Lucene annotations.

The problem is that whenever there is a crud operation on any tables, Hibernate does some queries on these two cms tables (Which we believe is slowing down the operations). We know about this from the hibernate documentation:

3.1.2. Automatic indexing

By default, every time an object is inserted, updated or deleted through Hibernate, Hibernate Search updates the according Lucene index....

We also know that we can avoid this by using the manual index (Doc link). But we don't want to use the index manually(Because we don't want to do that by code).

We like the automatic indexing, but we need to configure hibernate in such a way that the index should update/check/insert only if any of the two cms tables are inserted/updated/deleted. Is there any way we can do this out of the box? I think this is a very valid use case.

We are using:

  • hibernate.core.version - 4.2.15.Final
  • hibernate.search.version - 4.3.0.Final
  • lucene-core - 3.6.2

UPDATE 21/01/2015 - 17:44 GMT

I have done some more testing and I can clearly see that the indexes are updated when a unrelated entity is updated/inserted. We are using a @ClassBridge (For extracting word/pdf etc) and I can see that the call is coming inside the ClassBridge implementation and calling the document.add(...). This is very weird!


Solution

  • As one of the authors of the documentation, apologies for the wording being unclear.

    What Hibernate Search will actually do, is only to load the data it strictly needs to keep the indexes in sync, and only for those entities which are indexed. The section

    updates the according Lucene index

    should suggest that if there is no "according" index which needs updates, it's not doing that.

    So it will actually do as you are describing, and it's even a bit smarter: it will only update the index if the update operation is actually affecting one or more of the indexed properties.

    For example, if you have an Indexed Entity "Person" with an indexed attribute "name" and a non-indexed attribute "email", it will issue an update operation on the "Person" index when you're updating a Person entity to update the "name", but it will skip the operation if you're only changing the "email" attribute.

    If you're having performance issues, I would suggest using diagnostic tools to get information on what is happening rather than trying to guess.