I'm working on a scrapy project using Python 3 and the spiders are deployed to scrapinghub. I'm also using Google Cloud Storage to store the scraped files as mentioned in the official doc here.
The spiders are running absolutely fine when i'm running it locally and the spiders are getting deployed to scrapinghub without any errors. I'm using scrapy:1.4-py3 as the stack for the scrapinghub. While running the spiders on it, i'm getting the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 77, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 102, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 36, in from_settings
mw = mwcls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/media.py", line 68, in from_crawler
pipe = cls.from_settings(crawler.settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 95, in from_settings
return cls(store_uri, settings=settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 52, in __init__
download_func=download_func)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 234, in __init__
self.store = self._get_store(store_uri)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 269, in _get_store
store_cls = self.STORE_SCHEMES[scheme]
KeyError: 'gs'
PS: 'gs' is used in the path to store the files like
'IMAGES_STORE':'gs://<bucket-name>/'
I have researched about this error, but there aren't any solutions as such. Any help would be of immense help.
Google Cloud Storage support is a new feature in Scrapy 1.5, so you need to use scrapy:1.5-py3
stack in Scrapy Cloud.