Search code examples
pythonloggingscrapyerror-logging

Where is from "ERROR: Spider error processing <GET..." in scrapy?


I am reading a log from a previous spider's launching. I am curious to know where is from this exception and how I can act on it:

2019-04-12 22:00:55 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.website.com/next_page> (referer: https://www.website.com/prev_page)
Traceback (most recent call last):...

I looked at the files middlewares.py, settings.py and so on in my project and I do not find any lines where it is written logging.error or spider.logger.error. Even in the built-in methods def process_spider_exception(self, response, exception, spider): or def process_exception(self, request, exception, spider): I do not find any line that orders a log message. Looking at the documentation does not clarify it, as for me.

Now about to act on it. If I would like to know where it is from, is because I would like to try to insert some lines that orders to add the urls in a file dedicated to some kind of exceptions that make rise an spider error processing to analyze it, correct it, and launch the spider again on these specific urls from this file because that's more comfortable than from a scrapy log file.

Beyond the wish of acting on it, I would like to know where it is and how it works.


Solution

  • To answer your question, that log messsage is coming from handle_spider_error method in scrapy package

    core/scraper.py

    Regarding finding source of error, hints are usually traceback that comes along with this error log.

    You can also follow code that call this url 'https://www.website.com/next_page'