Sometimes I get IndexError because I successfully scrape only half of the page causing the parsing logic to get IndexError. How can I retry when I get IndexError?
It's ideally a middleware so it can handle multiple spiders at once.
In the end, I use a decorator and call _retry()
function from RetryMiddleware
in the decorator function. It works well. It's not the best, it's best to be able to have a middleware handling it. But it's better than nothing.
from scrapy.downloadermiddlewares.retry import RetryMiddleware
def handle_exceptions(function):
def parse_wrapper(spider, response):
try:
for result in function(spider, response):
yield result
except IndexError as e:
logging.log(logging.ERROR, "Debug HTML parsing error: %s" % (unicode(response.body, 'utf-8')))
RM = RetryMiddleware(spider.settings)
yield RM._retry(response.request, e, spider)
return parse_wrapper
Then I use the decorator like this:
@handle_exceptions
def parse(self, response):