I am seeking to convert a deeply nested json string into a Pandas Dataframe with hierarchical multiindex based on the hierarchy within the json.
I don't know the structure of the json and this will be used on a list of jsons with will likely have different hierarchies so this needs to be dynamic and cannot be hardcoded.
For the purpose of this question I'm using the following the "Example 2" data from the following link (though the actual data is much larger and more deeply nested): https://support.oneskyapp.com/hc/en-us/articles/208047697-JSON-sample-files
I am using the following code to interatively flatten the json:
import itertools as it
import pandas as pd
def flatten_json (dictionary):
def unpack(parent_key,parent_value):
if isinstance (parent_value,dict):
for key, value in parent_value.items():
temp = parent_key + "-" + key
print("parent_key: ", parent_key," key: ", key)
yield temp, value
elif isinstance(parent_value, list):
i = 0
for value in parent_value:
temp2 = parent_key + "_" + str(i)
i += 1
yield temp2, value
else:
yield parent_key, parent_value
while True:
dictionary = dict(it.chain.from_iterable(it.starmap(unpack,dictionary.items())))
if not any(isinstance(value,dict) for value in dictionary.values()) and not any(isinstance(value,list) for value in dictionary.values()):
break
return dictionary
data = ***json data from link above as a string - omitting due to length***
output_table = pd.Series(flatten_json(data)).to_frame()
The produces the following output:
I want to produce the following output:
Found a possible solution with some playing around from the dictionary output from the original flatten_json() function. Not sure how efficient this is but it seems to work:
dictionary = flatten_json(data2)
all_values = list(dictionary.values())
index_list = []
for key in dictionary:
x = tuple(key.split("-"))
index_list.append(x)
index = pd.MultiIndex.from_tuples(index_list)
df = pd.Series(all_values, index=index).to_frame()
Any suggestions or comments welcome...