I've been fighting to go from a nested dictionary of depth D to a pandas DataFrame.
I've tried with recursive function, like the following one, but my problem is that when I'm iterating over a KEY, I don't know what was the pervious key.
I've also tried with json.normalize, pandas from dict but I always end up with dots in the columns...
Example code:
def iterate_dict(d, i = 2, cols = []):
for k, v in d.items():
# missing here how to check for the previous key
# so that I can create an structure to create the dataframe.
if type(v) is dict:
print('this is k: ', k)
if i % 2 == 0:
cols.append(k)
i+=1
iterate_dict(v, i, cols)
else:
print('this is k2: ' , k, ': ', v)
iterate_dict(test2)
This is an example of how my dictionary looks like:
# example 2
test = {
'column-gender': {
'male': {
'column-country' : {
'FRENCH': {
'column-class': [0,1]
},
('SPAIN','ITALY') : {
'column-married' : {
'YES': {
'column-class' : [0,1]
},
'NO' : {
'column-class' : 2
}
}
}
}
},
'female': {
'column-country' : {
('FRENCH', 'SPAIN') : {
'column-class' : [[1,2],'#']
},
'REST-OF-VALUES': {
'column-married' : '*'
}
}
}
}
}
And this is how I want the dataframe to look like:
Any suggestion is welcome :)
If the column-keys are consistently prefixed with column-
, you can create a recursive function:
def data_to_df(data):
rec_out = []
def dict_to_rec(d, curr_row={}):
for k, v in d.items():
if 'column-' in k: # definition of a column
if isinstance(v, dict):
for val, nested_dict in v.items():
dict_to_rec(nested_dict, dict(curr_row, **{k[7:]: val}))
else:
rec_out.append(dict(curr_row, **{k[7:]: v}))
dict_to_rec(data)
return pd.DataFrame(rec_out)
print(data_to_df(test))
Edit: removing unnecessary variable and argument
Output:
gender country class married
0 male FRENCH [0, 1] NaN
1 male (SPAIN, ITALY) YES [0, 1]
2 male (SPAIN, ITALY) NO 2
3 female (FRENCH, SPAIN) [[1, 2], #] NaN
4 female REST-OF-VALUES * NaN