Search code examples
pythonjsonpandaspandas-explode

Explode function


This is my first question on here. I have searched around on here and throughout the web and I seem unable to find the answer to my question. I'm trying to explode out a list in a json file out into multiple columns and rows. Everything I have tried so far has proven unsuccessful.

I am doing this over multiple json files within a directory in order to have it print out in the dataframe like this. Goal:

did Version Nodes rds time c sc f uc
did Version Nodes rds time c sc f uc
did Version Nodes rds time c sc f uc
did Version Nodes rds time c sc f uc

Instead I get this in my dataframe:

did Version Nodes rds fusage
did Version Nodes rds everything in fusage
did Version Nodes rds everything in fusage
did Version Nodes rds everything in fusage

example of the json I'm working with. The json structure will not change

{
  "did": "123456789",
  "mId": "1a2b3cjsks",
  "timestamp": "2021-11-26T11:10:58.322000",
  "beat": {
    "did": "123456789",
    "collectionTime": "2010-05-26 11:10:58.004783+00",
    "Nodes": 6,
    "Version": "v1.4.6-2",
    "rds": "0.00B",
    "fusage": [
      {
        "time": "2010-05-25",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      },
      {
        "time": "2010-05-19",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      },
      {
        "t": "2010-05-23",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      },
      {
        "time": "2010-05-23",
        "c": "string",
        "sc": "string",
        "f": "string",
        "uc": "int"
      }
    ]
  }
}

My end goal is to get the dataframe out to a csv in order to be ingested. I appreciate everyone's help looking at this.

using python 3.8.10 & pandas 1.3.4

python code below

import csv
import glob
import json
import os
import pandas as pd

tempdir = '/dir/to/files/json_temp'
json_files = os.path.join(tempdir, '*.json')
file_list = glob.glob(json_files)
dfs = []

for file in file_list:
    with open(file) as f:
        data = pd.json_normalize(json.loads(f.read()))
        dfs.append(data)
        df = pd.concat(dfs, ignore_index=True)
        df.explode('fusage')
        print(df)

Solution

  • If you're going to use the explode function, after that, apply pd.Series over the column containing the fusage list (beat.fusage) to obtain a Series for each list item.

    /dir/to/files
    ├── example-v1.4.6-2.json
    └── example-v2.2.2-2.json
    
    ...
    for file in file_list:
        with open(file) as f:
            data = pd.json_normalize(json.loads(f.read()))
            dfs.append(data)
    
    df = pd.concat(dfs, ignore_index=True)
    fusage_list = df.explode('beat.fusage')['beat.fusage'].apply(pd.Series)
    df = pd.concat([df, fusage_list], axis=1)
    
    # show desired columns
    df = df[['did', 'beat.Version', 'beat.Nodes', 'beat.rds', 'time', 'c', 'sc', 'f', 'uc']]
    print(df)
    

    Output from df

             did beat.Version  beat.Nodes beat.rds        time       c      sc       f   uc
    0  123456789     v1.4.6-2           6    0.00B  2010-05-25  string  string  string  int
    0  123456789     v1.4.6-2           6    0.00B  2010-05-19  string  string  string  int
    0  123456789     v1.4.6-2           6    0.00B         NaN  string  string  string  int
    0  123456789     v1.4.6-2           6    0.00B  2010-05-23  string  string  string  int
    1  123777777     v2.2.2-2           4    0.00B  2010-05-25  string  string  string  int
    1  123777777     v2.2.2-2           4    0.00B  2010-05-19  string  string  string  int
    1  123777777     v2.2.2-2           4    0.00B         NaN  string  string  string  int
    1  123777777     v2.2.2-2           4    0.00B  2010-05-23  string  string  string  int