Search code examples
solrdataimporthandler

Adding new documents to the Solr index when using the dataimporthandler


I am using the Solr dataImportHandler to index a database for searching. A delta import is scheduled to run every 5 minutes to pick up changes to the database. However, the delta import only picks up changes to existing items, not newly added items.

What is a good way of getting the new items indexed? Should I be scheduling full imports as well, and if so is there a good practice guide on how often to do this? (I haven't been able to find one). Or should I add the new items to the index with Solrj (I am using Solrj for the actual searching), and if so are there pitfalls around having two different things updating the Solr index? Or does the fact that the delta import doesn't pick up new items indicate that I've done something wrong with my db-data-config file? (That would hardly be surprising).

If anyone has any experience with this sort of thing it would be great to know what worked for you.

Edited to add: The queries in my db-data-config file look like this:

query="select * from t_monkeys where is_deleted=0"
deltaQuery="select monkey_id from t_monkeys where last_changed_dt > '${dataimporter.last_index_time}'"
deltaImportQuery="select * from t_monkeys where concat('monkey-',monkey_id) = '${dataimporter.delta.monkey_id}'

Solution

  • If you are using a temporal field (such as 'lastModified') for the delta query, make sure you also set that field while creating a new row.

    For example, if your table has 4 columns id, name, updatedOn, addedOn and you use updatedOn in your data-config.xml file for identifying rows which have changed, then make sure your updatedOn is not null for new rows.