This works (normalize data inline):
import pandas as pd
data = [
{
"action_type": "add_chat_item",
"author": {
"id": "abcde",
"images": [
{
"id": "source",
"url": "https://www.example.com"
},
{
"height": 32,
"id": "32x32",
"url": "https://www.example.com",
"width": 32
},
{
"height": 64,
"id": "64x64",
"url": "https://www.example.com",
"width": 64
}
],
"name": "user1"
},
"message": "the first user sent this",
"message_id": "abcde",
"message_type": "text_message",
"time_in_seconds": -999,
"time_text": "-0:00",
"timestamp": 2000
},
{
"action_type": "add_chat_item",
"author": {
"badges": [
{
"icon_name": "moderator",
"title": "Moderator"
}
],
"id": "abeee",
"images": [
{
"id": "source",
"url": "https://www.example.com"
},
{
"height": 32,
"id": "32x32",
"url": "https://www.example.com",
"width": 32
},
{
"height": 64,
"id": "64x64",
"url": "https://www.example.com",
"width": 64
}
],
"name": "user2"
},
"message": "message from the user two",
"message_id": "abeedk",
"message_type": "text_message",
"time_in_seconds": -111,
"time_text": "-1:11",
"timestamp": 100003
}
]
norm = pd.json_normalize(data)
norm.to_csv('test.csv', encoding='utf-8', index=False, columns=['timestamp','author.name','message'])
And this works, with the above data saved as file.json (read file and outputs columns including full author
column, to CSV):
import pandas as pd
with open('file.json', encoding='utf-8') as inputfile:
df = pd.read_json(inputfile)
df.to_csv('file.csv', encoding='utf-8', index=False, columns=['timestamp','author','message'])
But this doesn't work with that same file:
import pandas as pd
with open('file.json', encoding='utf-8') as inputfile:
df = pd.read_json(inputfile)
norm = pd.json_normalize(df)
norm.to_csv('file.csv', encoding='utf-8', index=False, columns=['timestamp','author.name','message'])
I get a bunch of errors, the last lines of which are:
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['timestamp', 'author.name', 'message'], dtype='object')] are in the [columns]"
This results in the same error:
with open('file.json', encoding='utf-8') as inputfile:
df = pd.json_normalize(inputfile)
df.to_csv('file.csv', encoding='utf-8', index=False, columns=['timestamp','author.name','message'])
This results in a successful CSV, but only with new lines:
with open('file.json', encoding='utf-8') as inputfile:
df = pd.json_normalize(inputfile)
df.to_csv('file3.csv', encoding='utf-8', index=False)
I expected to output the author.name
column in the CSV when reading from the JSON file. Instead, the code errors out when it reads the file and normalizes it.
I think you need json.load
instead pd.read_json
first:
import json
with open('file.json', encoding='utf-8') as inputfile:
data = json.load(inputfile)
norm = pd.json_normalize(data)
norm.to_csv('file.csv', encoding='utf-8', index=False,
columns=['timestamp','author.name','message'])
Details:
print (norm.head(2))
action_type message message_id message_type \
0 add_chat_item the first user sent this abcde text_message
1 add_chat_item message from the user two abeedk text_message
time_in_seconds time_text timestamp author.id \
0 -999 -0:00 2000 abcde
1 -111 -1:11 100003 abeee
author.images author.name \
0 [{'id': 'source', 'url': 'https://www.example.... user1
1 [{'id': 'source', 'url': 'https://www.example.... user2
author.badges
0 NaN
1 [{'icon_name': 'moderator', 'title': 'Moderato...