When I turn on DUPEFILTER_DEBUG
, I got:
2016-09-21 01:48:29 [scrapy] DEBUG: Filtered duplicate request: http://www.example.org/example.html>
The problem is, I need to know the duplicate request's referrer to debug the code. How can I debug the referrer?
One option would be a custom filter based on the built-in RFPDupeFilter
filter:
from scrapy.dupefilters import RFPDupeFilter
class MyDupeFilter(RFPDupeFilter):
def log(self, request, spider):
self.logger.debug(request.headers.get("REFERER"), extra={'spider': spider})
super(MyDupeFilter, self).log(request, spider)
Don't forget to set the DUPEFILTER_CLASS
setting to point to your custom class.
(not tested)