Search code examples
solrapache-stormstormcrawler

Deleting the fetched records automatically when Fetch_Error occurs with solr and storm crawler integration


I have Solr and Storm Crawler integrated. I need to handle the deletion of the document from the solr index after FETCH_ERROR status gets converted into an ERROR after a number of successive attempts which is not happening right now. I read in case of elasticsearch, we have AbstractStatusUpdaterBolt and DeletionBolt to take care of that. Do we have any similar deletion bolt for solr integration also which actually along with StatusUpdaterBolt could delete the record from solr index? Any direction would help. Thanks.


Solution

  • Currently, with StormCrawler 1.15, we don't have a DeletionBolt for SOLR. Writing one should not be too difficult, you could use the one for ES as an example. The logic of sending tuples to the deletion stream is already handled by the AbstractStatusUpdater bolt so there is nothing to do on that front.

    Feel free to open an issue to ask for this to be added, or even better, contribute a pull request if you can.