Python Scrapy Shell Error While Scraping Wallmart

I am scraping walmart.com using scrapy. when i am fetching https://www.walmart.com/ there is no error but when trying to fetch "https://www.walmart.com/search?q=tablets&typeahead=tabltes" the below error appears: I have already disabled obey robot.text and employed scrapy fake user agents.

2024-02-14 09:42:25 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.walmart.com/search?q=tablets&typeahead=tabltes> 2024-02-14 09:42:25 [py.warnings] WARNING: C:\Users\SADAM1\PycharmProjects\untitled4\v

import scrapy

class Wal1Spider(scrapy.Spider):
    name = "wal1"
    allowed_domains = ["walmart.com"]
    start_urls = ["https://walmart.com"]
    
    
    custom_settings = {
        "DOWNLOAD_DELAY": 6.3,
        "RANDOMIZE_DOWNLOAD_DELAY": True,
        "COOKIES_ENABLED": False,
        "AUTOTHROTTLE_ENABLED": True,
        "AUTOTHROTTLE_START_DELAY ": 2,
        "AUTOTHROTTLE_MAX_DELAY": 11.7,
        "AUTOTHROTTLE_TARGET_CONCURRENCY": 1,
        "CONCURRENT_REQUESTS": 4,
        "ROBOTSTXT_OBEY": False,
    }
    def parse(self, response):

        pass

env\lib\site-packages\scrapy_fake_useragent\middleware.py:95: ScrapyDeprecation Warning: Attribute RetryMiddleware.EXCEPTIONS_TO_RETRY is deprecated. Use the RETRY_EXCEPTIONS setting instead. if isinstance(exception, self.EXCEPTIONS_TO_RETRY) [enter image description here](https://i.sstatic.net/oeTg0.png)

i have tried disabling robot.text obey and employed scrapy fake user agents

Solution

If you're using the Scrapy shell, then the settings you define in your spider aren't used. You could try passing that specific option when using scrapy shell through the --set flag which sets/overrides settings;

$ scrapy shell --set="ROBOTSTXT_OBEY=False"

once there;

fetch("https://www.walmart.com/search?q=tablets&typeahead=tablte")
# 2024-02-14 19:24:29 [scrapy.core.engine] INFO: Spider opened
# 2024-02-14 19:24:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.walmart.com/search?q=tablets&typeahead=tablte> (referer: None)