I have DataFrame which contains dictionaries in the columns.
Can be created as below
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
pd.DataFrame(lis)
author created id items
0 {'self': 'A', 'displayName': 'A'} 2018-12-18 1 {'field': 'status', 'fromString': 'Backlog'}
1 {'self': 'B', 'displayName': 'B'} 2018-12-18 2 {'field': 'status', 'fromString': 'Funnel'}
I want to convert this info multi level DataFrame.
I have been trying with
pd.MultiIndex.from_product(lis)
pd.MultiIndex.from_frame(pd.DataFrame(lis))
But not able to get the result i am looking for.Basically i want like below:
author created id items
self displayName field fromString
A A 2018-12-18 1 status Backlog
B B 2018-12-18 2 status Funnel
Any suggestions on how i can achieve this ?
Thanks
You can use json.json_normalize
- but columns names are flattened with .
separator:
from pandas.io.json import json_normalize
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
df = json_normalize(lis)
print (df)
id created author.self author.displayName items.field items.fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
For MulitIndex
in columns and in index - first create Mulitiindex
by all columns without .
by DataFrame.set_index
and then use str.split
:
df = df.set_index(['id','created'])
df.columns = df.columns.str.split('.', expand=True)
print (df)
author items
self displayName field fromString
id created
1 2018-12-18 A A status Backlog
2 2018-12-18 B B status Funnel
If need MulitIndex
in columns - it is possible, but get missing values in columns names:
df.columns = df.columns.str.split('.', expand=True)
print (df)
id created author items
NaN NaN self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
Missing values should be replaced by empty string:
df = df.rename(columns= lambda x: '' if x != x else x)
print (df)
id created author items
self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel