Search code examples
scrapypython-requestsscrapyd

JSONDecodeError with Scrapy: Expecting value: line 1 column 1 (char 0)


I am using requests in order to fetch and parse some data scraped using Scrapy with Scrapyrt (real time scraping).

This is how I do it:

#pass spider to requests parameters # 
params = {
        'spider_name': spider,
        'start_requests':True
        }
# scrape items
response = requests.get('http://scrapyrt:9080/crawl.json', params)
print ('RESPONSE JSON',response.json())
data = response.json()

As per Scrapy documentation, with 'start_requests' parameter set as True, the spider automatically requests urls and passes the response to the parse method which is the default method used for parsing requests.

start_requests

type: boolean

optional

Whether spider should execute Scrapy.Spider.start_requests method. start_requests are executed by default when you run Scrapy Spider normally without ScrapyRT, but this method is NOT executed in API by default. By default we assume that spider is expected to crawl ONLY url provided in parameters without making any requests to start_urls defined in Spider class. start_requests argument overrides this behavior. If this argument is present API will execute start_requests Spider method.

But the setup is not working. Log:

[2019-05-19 06:11:14,835: DEBUG/ForkPoolWorker-4] Starting new HTTP connection (1): scrapyrt:9080
[2019-05-19 06:11:15,414: DEBUG/ForkPoolWorker-4] http://scrapyrt:9080 "GET /crawl.json?spider_name=precious_tracks&start_requests=True HTTP/1.1" 500 7784
[2019-05-19 06:11:15,472: ERROR/ForkPoolWorker-4] Task project.api.routes.background.scrape_allmusic[87dbd825-dc1c-4789-8ee0-4151e5821798] raised unexpected: JSONDecodeError('Expecting value: line 1 column 1 (char 0)',)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/src/app/project/api/routes/background.py", line 908, in scrape_allmusic
    print ('RESPONSE JSON',response.json())
  File "/usr/lib/python3.6/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Solution

  • The error was due to a bug with Twisted 19.2.0, a scrapyrt dependency, which assumed response to be of wrong type.

    Once I installed Twisted==18.9.0, it worked.