Search code examples
pythonjsonapipython-requestsimdb

Can not parse response from sg.media-imdb in python


I'm trying to parse response from https://sg.media-imdb.com/suggests/a/a.json in Python 3.6.8.

Here is my code:

import requests

url = 'https://sg.media-imdb.com/suggests/a/a.json'
data = requests.get(url).json()

I get this error:

$ /usr/bin/python3 /home/livw/Python/test_scrapy/phase_1.py
Traceback (most recent call last):
  File "/home/livw/Python/test_scrapy/phase_1.py", line 33, in <module>
    data = requests.get(url).json()
  File "/home/livw/.local/lib/python3.6/site-packages/requests/models.py", line 889, in json
    self.content.decode(encoding), **kwargs
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

It seems like the response format is not JSON format, although I can parse the response at JSON Formatter & Validator

How to fix it and store the response in a json object?


Solution

  • This probably happend because its not a complete json, it have a prefix

    you can see that the response start with imdb$a( and ends with ) json parsing doesn't know how to handle it and he fails, you can remove those values and just parse the json itself

    you can do this:

    import json
    import requests
    
    url = 'https://sg.media-imdb.com/suggests/a/a.json'
    data = requests.get(url).text
    json.loads(data[data.index('{'):-1])