Search code examples
pythonscrapyscrapy-pipelinescrapy-settings

Scrapy: How to access the custom, CLI passed settings from the __init__() method of a spider class?


I need to access the custom settings passed from the CLI using:

-s SETTING_NAME="SETTING_VAL" from the __init__() method of the spider class.

get_project_settings() allows me to access only the static settings.

The docs explain how you can access those custom settings by from a pipeline setting up a new pipeline through:

@classmethod
def from_crawler(cls, crawler):
    settings = crawler.settings

But is there any way to access them from the __init__() spider method?


Solution

  • Just use settings.get e.g.

    print(self.settings.get('SETTING_NAME'))
    

    will print

    SETTING_VAL
    

    If you want to access a setting in your spider __init__ you have a couple of options. If you command-line options is just a spider argument, use -a instead of -s. If for some reason you need to access an actual setting in your spider __init__ then you have to override the from_crawler classmethod as described in the docs.

    Here is an example:

    import scrapy
    
    class ArgsSpider(scrapy.Spider):
        name = "my_spider"
    
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            print('kwargs =', kwargs)
    
        @classmethod
        def from_crawler(cls, crawler, *args, **kwargs):
            spider = cls(
                *args,
                my_setting=crawler.settings.get("MY_SETTING"),
                **kwargs
            )
            spider._set_crawler(crawler)
            return spider
    

    run with e.g. scrapy runspider args_spider.py -s MY_SETTING=hello,world! and you will see your setting in the kwargs dict. You can of course get other settings this way too