Search code examples
pythonooptwistedscrapy

what is being passed to scrapy DownloaderStats middleware?


I'm trying to alter Scrapy's stats middleware.

Here's Scrapy's stats.py in full:

from scrapy.exceptions import NotConfigured
from scrapy.utils.request import request_httprepr
from scrapy.utils.response import response_httprepr

class DownloaderStats(object):

    def __init__(self, stats):
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        if not crawler.settings.getbool('DOWNLOADER_STATS'):
            raise NotConfigured
        return cls(crawler.stats)

    def process_request(self, request, spider):
        self.stats.inc_value('downloader/request_count', spider=spider)
        self.stats.inc_value('downloader/request_method_count/%s' % request.method, spider=spider)
        reqlen = len(request_httprepr(request))
        self.stats.inc_value('downloader/request_bytes', reqlen, spider=spider)

    def process_response(self, request, response, spider):
        self.stats.inc_value('downloader/response_count', spider=spider)
        self.stats.inc_value('downloader/response_status_count/%s' % response.status, spider=spider)
        reslen = len(response_httprepr(response))
        self.stats.inc_value('downloader/response_bytes', reslen, spider=spider)
        return response

    def process_exception(self, request, exception, spider):
        ex_class = "%s.%s" % (exception.__class__.__module__, exception.__class__.__name__)
        self.stats.inc_value('downloader/exception_count', spider=spider)
        self.stats.inc_value('downloader/exception_type_count/%s' % ex_class, spider=spider)

In the from_crawler classmethod, what is it, exactly, that's getting passed in?


Solution

  • First of all, DownloaderStats(object) doesn't mean that DownloaderStats is being passed an object, it means that the DownloaderStats class extends the object class.

    In your class method, cls is the class being called, in this case DownloaderStats. So the code cls(crawler.stats) could be thought of as DownloaderStats(crawler.stats), which instantiates an object of the class DownloaderStats. Instantiating objects in Python cause their __init__ method to be called, so the value of crawler.stats gets assigned to the stats parameter of the __init__ method, which then gets assigned to self.stats.