Search code examples
pythonjsonimport.iojsonlines

JSON Line issue when loading from import.io using Python


I'm having a hard time trying to load an API response from import.io into a file or a list.

The enpoint I'm using is https://data.import.io/extractor/{0}/json/latest?_apikey={1}

Previously all my scripts were set to use normal JSON and all was working well, but now hey have decided to use json line, but somehow it seems malformed.

The way I tried to adapt my scripts is to read the API response in the following way:

url_call = 'https://data.import.io/extractor/{0}/json/latest?_apikey={1}'.format(extractors_row_dict['id'], auth_key)
r = requests.get(url_call)

with open(temporary_json_file_path, 'w') as outfile:
    json.dump(r.content, outfile)

data = []
with open(temporary_json_file_path) as f:
    for line in f:
        data.append(json.loads(line))

the problem doing this is that when I check data[0], all of the json file content was dumped in it...

data[1] = IndexError: list index out of range

Here is an example of data[0][:300]:

u'{"url":"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de","result":{"extractorData":{"url":"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de","resourceId":"23455234","data":[{"group":[{"Brand":[{"text":"Brand","href":"https://www.example.com'

Does anyone have experience with the response of this API? All other jsonline reads I do from other sources work fine except this one.

EDIT based on comment:

print repr(open(temporary_json_file_path).read(300))

gives this:

'"{\\"url\\":\\"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de\\",\\"result\\":{\\"extractorData\\":{\\"url\\":\\"https://www.example.com/de/shop?condition[0]=new&page=1&lc=DE&l=de\\",\\"resourceId\\":\\"df8de15cede2e96fce5fe7e77180e848\\",\\"data\\":[{\\"group\\":[{\\"Brand\\":[{\\"text\\":\\"Bra'

Solution

  • You've got a bug in your code where you are double encoding:

    with open(temporary_json_file_path, 'w') as outfile:
        json.dump(r.content, outfile)
    

    Try:

    with open(temporary_json_file_path, 'w') as outfile:
        outfile.write(r.content)