Search code examples
pythonpython-2.7web-scrapingscrapyscrapy-settings

Scrapy - How to get duplicate request referer


When I turn on DUPEFILTER_DEBUG, I got:

2016-09-21 01:48:29 [scrapy] DEBUG: Filtered duplicate request: http://www.example.org/example.html>

The problem is, I need to know the duplicate request's referrer to debug the code. How can I debug the referrer?


Solution

  • One option would be a custom filter based on the built-in RFPDupeFilter filter:

    from scrapy.dupefilters import RFPDupeFilter
    
    class MyDupeFilter(RFPDupeFilter):
        def log(self, request, spider):
            self.logger.debug(request.headers.get("REFERER"), extra={'spider': spider})
            super(MyDupeFilter, self).log(request, spider)
    

    Don't forget to set the DUPEFILTER_CLASS setting to point to your custom class.

    (not tested)