I am trying to get some data from an open API:
https://data.brreg.no/enhetsregisteret/api/enheter/lastned
but I am having difficulties understanding the different type of objects and the order the conversions should be in. Is it strings
to bytes
, is it BytesIO
or StringIO
, is it decode('utf-8)
or decode('unicode)
etc..?
So far:
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
and now is where I am stuck, how should I write the next line of code?
json_str = json.loads(decompressed_file.read().decode('utf-8'))
My workaround is if I write it as a json file then read it in again and do the transformation to df then it works:
with io.open('brreg.json', 'wb') as f:
f.write(decompressed_file.read())
with open(f_path, encoding='utf-8') as fin:
d = json.load(fin)
df = json_normalize(d)
with open('brreg_2.csv', 'w', encoding='utf-8', newline='') as fout:
fout.write(df.to_csv())
I found many SO posts about it, but I am still so confused. This first one explains it quite good, but I still need some spoon feeding.
Python 3, read/write compressed json objects from/to gzip file
TypeError when trying to convert Python 2.7 code to Python 3.4 code
How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?
It works fine for me using the decompress
function rather than the GZipFile
class to decompress the file, but not sure why yet...
import urllib.request
import gzip
import io
import json
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.decompress(compressed_file.read())
json_str = json.loads(decompressed_file.decode('utf-8'))
EDIT, in fact the following also works fine for me which appears to be your exact code...
(Further edit, turns out it's not quite your exact code because your final line was outside the with block which meant response
was no longer open when it was needed - see comment thread)
import urllib.request
import gzip
import io
import json
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
json_str = json.loads(decompressed_file.read().decode('utf-8'))