I'm trying to log-in using scrapy splash in exact same way as scrapy only. I've looked at documentation, Doc, it says "SplashFormRequest.from_response is also supported, and works as described in scrapy documentation" However, simple changing one line of code and changing settings as described in splash documentation doesn't bring any results. What I'm doing wrong? Code:
import scrapy
from scrapy_splash import SplashRequest
class MySpider(scrapy.Spider):
name = 'lost'
start_urls = ["myurl",]
def parse(self, response):
return SplashFormRequest.from_response(
response,
formdata={'username': 'pass', 'password': 'pass'},
callback=self.after_login
)
def after_login(self, response):
print response.body
if "keyword" in response.body:
self.logger.error("Success")
else:
self.logger.error("Failed")
Added to settings:
DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, } SPLASH_URL = 'http://localhost:8050' DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter' HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
Error log:
python@debian:~/Python/code/lostfilm$ scrapy crawl lost
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: lostfilm)
2017-01-26 20:24:22 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'lostfilm.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'SPIDER_MODULES': ['lostfilm.spiders'], 'BOT_NAME': 'lostfilm', 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage'}
2017-01-26 20:24:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
Unhandled error in Deferred:
2017-01-26 20:24:22 [twisted] CRITICAL: Unhandled error in Deferred:
2017-01-26 20:24:22 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 49, in load_object
raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))
NameError: Module 'scrapy.downloadermiddlewares.httpcompression' doesn't define any object named 'HttpCompresionMiddlerware'
You probably need to perform the 1st request with Splash too.
By default, the start_urls
attribute will issue "simple" scrapy.Request
, not SplashRequest
.
You need to override start_requests
method for your spider:
class MySpider(scrapy.Spider):
name = 'lost'
start_urls = ["myurl",]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url)
...