I am trying to use scrapy view https://www.example.com
(not the real link since I am not allowed to disclose it by my job. Sorry.) to debug the link, but then I got this error.
2018-11-01 20:49:29 [twisted] CRITICAL: Unhandled error in Deferred:
2018-11-01 20:49:29 [twisted] CRITICAL:
Traceback (most recent call last):
File "d:\kerja\hit\python projects\my_project\my_project-env\lib\site-packages\twisted\internet\defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "d:\kerja\hit\python projects\my_project\my_project-env\lib\site-packages\scrapy\crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "d:\kerja\hit\python projects\my_project\my_project-env\lib\site-packages\scrapy\crawler.py", line 79, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "d:\kerja\hit\python projects\my_project\my_project-env\lib\site-packages\scrapy\crawler.py", line 102, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "d:\kerja\hit\python projects\my_project\my_project-env\lib\site-packages\scrapy\spiders\__init__.py", line 51, in from_crawler
spider = cls(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'start_requests'
'page' is not recognized as an internal or external command,
operable program or batch file.
How to not get that error?
UPDATE:
I get that error on one of my Scrapy project but I don't get any error when using my other Scrapy project. It seems to be a problem in the spider.
1.
As mentioned by Elena in his/her answer, the sample command you gave wasn't quoted. You'll need to properly handle the &
character (by quoting the command or at least escaping that character) to pass the right URL to Scrapy as an argument.
While this is something that needs to be resolved, I don't think that's the cause of the TypeError
you currently have.
2.
When handling commands like scrapy fetch
and scrapy view
, Scrapy would need to initialize a scrapy.Spider
instance for the task.
During the process, Scrapy would look for a scrapy.cfg
file at the current path, and:
scrapy.Spider
class within.scrapy.Spider
instance.According to the log you shared, it's case A you're having.
What's more, when handling a scrapy fetch
command Scrapy would try overriding the start_requests
attribute via spider arguments (related code here). And according to the log you shared, your spider does not accept such an argument.
Thus you may try any of these approaches:
cd /tmp/
). Then retry the same scrapy fetch
command.scrapy fetch
command.In either case, you might need to fix the scrapy fetch
command as mentioned in #1.
3.
Sample code for proposal B above:
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
def __init__(self, argument_foo, argument_bar, *args, **kwargs):
super().__init__(*args, **kwargs)
# handle your argument "foo" and "bar" here
# e.g. self.xxx = int(argument_foo)