I'm using the following code in python to flatten the json structure below, however it doesn't work for all levels. I'm interested in the tags.tags column data specifically shown on the picture below
pd.json_normalize(data['conversations'])
pd.set_option('display.max_columns', None)
Json structure:
[{'type': 'conversation.list',
'pages': {'type': 'pages',
'next': {'page': 3,
'starting_after': 'WzE3MTU3ODIxOTc=='},
'page': 2,
'per_page': 1,
'total_pages': 46969},
'total_count': 46969,
'conversations': [{'type': 'conversation',
'id': '1384780',
'created_at': 1715780970,
'updated_at': 1715782197,
'waiting_since': None,
'snoozed_until': None,
'source': {'type': 'conversation',
'id': '2197597651',
'delivered_as': 'customer_initiated',
'subject': '',
'body': '<p>Outros</p>',
'author': {'type': 'user',
'id': '64ac5cacccd2b047',
'name': 'Claudiney Pinho',
'email': '[email protected]'},
'attachments': [],
'url': None,
'redacted': False},
'contacts': {'type': 'contact.list',
'contacts': [{'type': 'contact',
'id': '64accccd71982047',
'external_id': 'b363-cc8f--fb270b5e72e8'}]},
'first_contact_reply': {'created_at': 1715780970,
'type': 'conversation',
'url': None},
'admin_assignee_id': 5614527,
'team_assignee_id': 5045796,
'open': False,
'state': 'closed',
'read': True,
'tags': {'type': 'tag.list',
'tags': [{'type': 'tag',
'id': '5379642',
'name': '[BOT] Other',
'applied_at': 1715781024,
'applied_by': {'type': 'admin', 'id': '4685750'}},
{'type': 'tag',
'id': '5379660',
'name': '[BOT] Connected Agent',
'applied_at': 1715781025,
'applied_by': {'type': 'admin', 'id': '4685750'}},
{'type': 'tag',
'id': '5379654',
'name': '[BOT] Not Resolved',
'applied_at': 1715781027,
'applied_by': {'type': 'admin', 'id': '4685750'}},
{'type': 'tag',
'id': '7046337',
'name': '[BOT] Portuguese',
'applied_at': 1715781029,
'applied_by': {'type': 'admin', 'id': '4685750'}}]},
'priority': 'not_priority',
'sla_applied': None,
'statistics': {'type': 'conversation_statistics',
'time_to_assignment': 0,
'time_to_admin_reply': 189,
'time_to_first_close': 1158,
'time_to_last_close': 1228,
'median_time_to_reply': 139,
'first_contact_reply_at': 1715780970,
'first_assignment_at': 1715780970,
'first_admin_reply_at': 1715781159,
'first_close_at': 1715782128,
'last_assignment_at': 1715781159,
'last_assignment_admin_reply_at': 1715781159,
'last_contact_reply_at': 1715782179,
'last_admin_reply_at': 1715782125,
'last_close_at': 1715782198,
'last_closed_by_id': 5614527,
'count_reopens': 1,
'count_assignments': 3,
'count_conversation_parts': 28},
'conversation_rating': None,
'teammates': {'type': 'admin.list',
'admins': [{'type': 'admin', 'id': '5614527'}]},
'title': None,
'custom_attributes': {'Language': 'Portuguese',
'Conversation status': 'Open',
'From': 'iOS / Android'},
'topics': {'type': 'topic.list', 'topics': [], 'total_count': 0},
'ticket': None,
'linked_objects': {'type': 'list',
'data': [],
'total_count': 0,
'has_more': False}}]}]
I wish to create a pandas data frame with flatten key:values in a table
You can try using pd.json_normalize
function with the record_path and meta parameters.
This should flatten the JSON to different multiple levels of your choice, you can then further extract each nested lists or dictionaries into columns, which is of interest to you.
I have provided a code snippet for you, I am not using jupyter
so you might need to test it but in theory this should work nicely, since the concept is true.
import pandas as pd
df = pd.json_normalize(
data[0]["conversations"],
record_path=["tags", "tags"],
meta=[
"id",
"created_at",
"updated_at",
["source", "id"],
["source", "author", "name"],
],
meta_prefix="meta_", # trying to avoid conflicts with available ids
errors="ignore",
)
pd.set_option("display.max_columns", None)
Also note you have many NaN which caused errors, also I had to use meta_prefix for id
.
applied_by.id meta_id meta_created_at meta_updated_at meta_source.id \
0 4685750 1384780 1715780970 1715782197 NaN
1 4685750 1384780 1715780970 1715782197 NaN
2 4685750 1384780 1715780970 1715782197 NaN
3 4685750 1384780 1715780970 1715782197 NaN
meta_source.author.name
0 NaN
1 NaN
2 NaN
3 NaN