Search code examples
jsonpython-3.xpandasexport-to-csv

KeyError with pandas json_normalize when opening JSON file, but not with JSON embedded in code (JSON to CSV)


This works (normalize data inline):

import pandas as pd

data = [
    {
        "action_type": "add_chat_item",
        "author": {
            "id": "abcde",
            "images": [
                {
                    "id": "source",
                    "url": "https://www.example.com"
                },
                {
                    "height": 32,
                    "id": "32x32",
                    "url": "https://www.example.com",
                    "width": 32
                },
                {
                    "height": 64,
                    "id": "64x64",
                    "url": "https://www.example.com",
                    "width": 64
                }
            ],
            "name": "user1"
        },
        "message": "the first user sent this",
        "message_id": "abcde",
        "message_type": "text_message",
        "time_in_seconds": -999,
        "time_text": "-0:00",
        "timestamp": 2000
    }, 
    {
        "action_type": "add_chat_item",
        "author": {
            "badges": [
                {
                    "icon_name": "moderator",
                    "title": "Moderator"
                }
            ],
            "id": "abeee",
            "images": [
                {
                    "id": "source",
                    "url": "https://www.example.com"
                },
                {
                    "height": 32,
                    "id": "32x32",
                    "url": "https://www.example.com",
                    "width": 32
                },
                {
                    "height": 64,
                    "id": "64x64",
                    "url": "https://www.example.com",
                    "width": 64
                }
            ],
            "name": "user2"
        },
        "message": "message from the user two",
        "message_id": "abeedk",
        "message_type": "text_message",
        "time_in_seconds": -111,
        "time_text": "-1:11",
        "timestamp": 100003
    }
]

norm = pd.json_normalize(data)
norm.to_csv('test.csv', encoding='utf-8', index=False, columns=['timestamp','author.name','message'])

And this works, with the above data saved as file.json (read file and outputs columns including full author column, to CSV):

import pandas as pd

with open('file.json', encoding='utf-8') as inputfile:
     df = pd.read_json(inputfile)

df.to_csv('file.csv', encoding='utf-8', index=False, columns=['timestamp','author','message'])

But this doesn't work with that same file:

import pandas as pd

with open('file.json', encoding='utf-8') as inputfile:
     df = pd.read_json(inputfile)

norm = pd.json_normalize(df)
norm.to_csv('file.csv', encoding='utf-8', index=False, columns=['timestamp','author.name','message'])

I get a bunch of errors, the last lines of which are:

raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['timestamp', 'author.name', 'message'], dtype='object')] are in the [columns]"

This results in the same error:

with open('file.json', encoding='utf-8') as inputfile:
     df = pd.json_normalize(inputfile)

df.to_csv('file.csv', encoding='utf-8', index=False, columns=['timestamp','author.name','message'])

This results in a successful CSV, but only with new lines:

with open('file.json', encoding='utf-8') as inputfile:
     df = pd.json_normalize(inputfile)

df.to_csv('file3.csv', encoding='utf-8', index=False)

I expected to output the author.name column in the CSV when reading from the JSON file. Instead, the code errors out when it reads the file and normalizes it.


Solution

  • I think you need json.load instead pd.read_json first:

    import json
    
    with open('file.json', encoding='utf-8') as inputfile:    
        data = json.load(inputfile)
        
    norm = pd.json_normalize(data)
    norm.to_csv('file.csv', encoding='utf-8', index=False, 
                columns=['timestamp','author.name','message'])
    

    Details:

    print (norm.head(2))
         action_type                    message message_id  message_type  \
    0  add_chat_item   the first user sent this      abcde  text_message   
    1  add_chat_item  message from the user two     abeedk  text_message   
    
       time_in_seconds time_text  timestamp author.id  \
    0             -999     -0:00       2000     abcde   
    1             -111     -1:11     100003     abeee   
    
                                           author.images author.name  \
    0  [{'id': 'source', 'url': 'https://www.example....       user1   
    1  [{'id': 'source', 'url': 'https://www.example....       user2   
    
                                           author.badges  
    0                                                NaN  
    1  [{'icon_name': 'moderator', 'title': 'Moderato...