I am able to import data from json file using this code...
import requests
from pandas.io.json import json_normalize
url = "https://datameetgeobk.s3.amazonaws.com/image_list.json"
resp = requests.get(url=url)
df = json_normalize(resp.json()['Images'])
df.head()
But the column "BlockDeviceMappings" is actually a list and each item has DeviceName and Ebs parameters those are string and dicts. How do I further normalize my dataframe to include all the details in separate columns?
My screenshot does not match with the one shown in the answer. The Ebs column (second from left) is a dictionary.
import requests
import pandas as pd
url = "https://datameetgeobk.s3.amazonaws.com/image_list.json"
resp = requests.get(url=url)
resp = resp.json()
df = pd.json_normalize(resp['Images'])
inner_keys = [x for x in resp['Images'][0].keys() if x != 'BlockDeviceMappings']
df_bdm = pd.json_normalize(resp['Images'], record_path=['BlockDeviceMappings'], meta=inner_keys, errors='ignore')
bdm_df
:bdm_df = pd.json_normalize(resp['Images'], record_path=['BlockDeviceMappings'])
You will no doubt wonder why df
has 39995 entries, while bdm_df
has 131691 entries. This is because BlockDeviceMappings
is a list
of dicts
of varying lengths:
bdm_len = [len(x) for x in df.BlockDeviceMappings]
max(bdm_len)
>>> 31
BlockDeviceMappings
entry:[{'DeviceName': '/dev/sda1',
'Ebs': {'DeleteOnTermination': True,
'SnapshotId': 'snap-0aac2591b85fe677e',
'VolumeSize': 80,
'VolumeType': 'gp2',
'Encrypted': False}},
{'DeviceName': 'xvdb',
'Ebs': {'DeleteOnTermination': True,
'SnapshotId': 'snap-0bd8d7828225924a7',
'VolumeSize': 80,
'VolumeType': 'gp2',
'Encrypted': False}}]
df_bdm.head()