I'm trying to alter Scrapy's stats middleware.
Here's Scrapy's stats.py in full:
from scrapy.exceptions import NotConfigured
from scrapy.utils.request import request_httprepr
from scrapy.utils.response import response_httprepr
class DownloaderStats(object):
def __init__(self, stats):
self.stats = stats
@classmethod
def from_crawler(cls, crawler):
if not crawler.settings.getbool('DOWNLOADER_STATS'):
raise NotConfigured
return cls(crawler.stats)
def process_request(self, request, spider):
self.stats.inc_value('downloader/request_count', spider=spider)
self.stats.inc_value('downloader/request_method_count/%s' % request.method, spider=spider)
reqlen = len(request_httprepr(request))
self.stats.inc_value('downloader/request_bytes', reqlen, spider=spider)
def process_response(self, request, response, spider):
self.stats.inc_value('downloader/response_count', spider=spider)
self.stats.inc_value('downloader/response_status_count/%s' % response.status, spider=spider)
reslen = len(response_httprepr(response))
self.stats.inc_value('downloader/response_bytes', reslen, spider=spider)
return response
def process_exception(self, request, exception, spider):
ex_class = "%s.%s" % (exception.__class__.__module__, exception.__class__.__name__)
self.stats.inc_value('downloader/exception_count', spider=spider)
self.stats.inc_value('downloader/exception_type_count/%s' % ex_class, spider=spider)
In the from_crawler
classmethod, what is it, exactly, that's getting passed in?
First of all, DownloaderStats(object)
doesn't mean that DownloaderStats is being passed an object, it means that the DownloaderStats class extends the object
class.
In your class method, cls
is the class being called, in this case DownloaderStats
. So the code cls(crawler.stats)
could be thought of as DownloaderStats(crawler.stats)
, which instantiates an object of the class DownloaderStats. Instantiating objects in Python cause their __init__
method to be called, so the value of crawler.stats
gets assigned to the stats
parameter of the __init__
method, which then gets assigned to self.stats
.