Search code examples

Why does my basic scrapy request get no response?

I am new to scrapy and trying to submit a form and scrape the response from

When I use the scrapy shell:

scrapy shell ""

it opens up the shell but contains no response object. Running


returns none. I've tried using just "" and other variations but nothing seems to work. The example I followed used "" and it works fine.

Why do I get no response when using a different URL? Does it have to do with the https? Do I need to use a FormRequest to get an response since the link contains a form? I figured it would at least return the html of the form. I plan to 'check' various checkboxes upon submit.

Thanks in advance for any help!


2017-08-09 21:45:43 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: fbg)
2017-08-09 21:45:43 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'fbg.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['fbg.spiders'], 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'BOT_NAME': 'fbg', 'LOGSTATS_INTERVAL': 0}
2017-08-09 21:45:44 [scrapy.middleware] INFO: Enabled extensions:
2017-08-09 21:45:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
2017-08-09 21:45:45 [scrapy.middleware] INFO: Enabled spider middlewares:
2017-08-09 21:45:45 [scrapy.middleware] INFO: Enabled item pipelines:
2017-08-09 21:45:45 [scrapy.extensions.telnet] DEBUG: Telnet console listening on
2017-08-09 21:45:45 [scrapy.core.engine] INFO: Spider opened
2017-08-09 21:45:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET> (referer: None)
2017-08-09 21:45:45 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET>
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x1101058d0>
[s]   item       {}
[s]   request    <GET>
[s]   settings   <scrapy.settings.Settings object at 0x1101059e8>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects 
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser


  • Your log says:

    2017-08-09 21:45:45 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET>

    Seems like you have setting ROBOTSTXT_ENABLED set to True so your request is getting filtered out. Try either disabling it in your project or running scrapy shell url -s ROBOTSTXT_ENABLED=0

    The reason it worked when you "opened a new terminal" is that you probably started shell from non-project directory and scrapy no longer was picking up this setting from your project.