Search code examples
pythonpython-2.7scrapyscrapyd

scrapy log issue


i have multiple spiders in one project , problem is right now i am defining LOG_FILE in SETTINGS like

LOG_FILE = "scrapy_%s.log" % datetime.now()

what i want is scrapy_SPIDERNAME_DATETIME

but i am unable to provide spidername in log_file name ..

i found

scrapy.log.start(logfile=None, loglevel=None, logstdout=None)

and called it in each spider init method but its not working ..

any help would be appreciated


Solution

  • The spider's __init__() is not early enough to call log.start() by itself since the log observer is already started at this point; therefore, you need to reinitialize the logging state to trick Scrapy into (re)starting it.

    In your spider class file:

    from datetime import datetime
    from scrapy import log
    from scrapy.spider import BaseSpider
    
    class ExampleSpider(BaseSpider):
        name = "example"
        allowed_domains = ["example.com"]
        start_urls = ["http://www.example.com/"]
    
        def __init__(self, name=None, **kwargs):
            LOG_FILE = "scrapy_%s_%s.log" % (self.name, datetime.now())
            # remove the current log
            # log.log.removeObserver(log.log.theLogPublisher.observers[0])
            # re-create the default Twisted observer which Scrapy checks
            log.log.defaultObserver = log.log.DefaultObserver()
            # start the default observer so it can be stopped
            log.log.defaultObserver.start()
            # trick Scrapy into thinking logging has not started
            log.started = False
            # start the new log file observer
            log.start(LOG_FILE)
            # continue with the normal spider init
            super(ExampleSpider, self).__init__(name, **kwargs)
    
        def parse(self, response):
            ...
    

    And the output file might look like:

    scrapy_example_2012-08-25 12:34:48.823896.log