Search code examples
lucenesolrdataimporthandler

Solr DIH -- How to handle deleted documents?


I'm playing around with a Solr-powered search for my webapp, and I figured it'd be best to use the DataImportHandler to handle syncing with the app via the database. I like the elegance of just checking the last_updated_date field. Good stuff. However, I don't know how to handle deleting documents with this approach. The way I see it, I've got 2 choices. I could either send an explicit message to Solr from the client when a document is deleted, or I could add a "deleted" flag and leave the object in the database, so that Solr will notice that the document has changed and is now "deleted." I could add a query filter that would disregard results with the deleted flag, but it seems inefficient to include all the deleted documents in the Lucene index. What do other folks do?


Solution

  • These are your options:

    • Use DIH special commands $deleteDocById or $deleteDocByQuery (requires Solr 1.4+)
    • Use the clean parameter of DIH to delete the whole index before importing.
    • Use preImportDeleteQuery to define what's going to be cleaned up before importing. (requires Solr 1.4+)
    • Use database triggers instead of DIH to manage updating the index.
    • If you're using some sort of ORM use its interception capabilities instead of DIH. For example you can use hibernate events to update the index on update, insert or delete.