This is my first question on here. I have searched around on here and throughout the web and I seem unable to find the answer to my question. I'm trying to explode out a list in a json file out into multiple columns and rows. Everything I have tried so far has proven unsuccessful.
I am doing this over multiple json files within a directory in order to have it print out in the dataframe like this. Goal:
did | Version | Nodes | rds | time | c | sc | f | uc |
---|---|---|---|---|---|---|---|---|
did | Version | Nodes | rds | time | c | sc | f | uc |
did | Version | Nodes | rds | time | c | sc | f | uc |
did | Version | Nodes | rds | time | c | sc | f | uc |
Instead I get this in my dataframe:
did | Version | Nodes | rds | fusage |
---|---|---|---|---|
did | Version | Nodes | rds | everything in fusage |
did | Version | Nodes | rds | everything in fusage |
did | Version | Nodes | rds | everything in fusage |
example of the json I'm working with. The json structure will not change
{
"did": "123456789",
"mId": "1a2b3cjsks",
"timestamp": "2021-11-26T11:10:58.322000",
"beat": {
"did": "123456789",
"collectionTime": "2010-05-26 11:10:58.004783+00",
"Nodes": 6,
"Version": "v1.4.6-2",
"rds": "0.00B",
"fusage": [
{
"time": "2010-05-25",
"c": "string",
"sc": "string",
"f": "string",
"uc": "int"
},
{
"time": "2010-05-19",
"c": "string",
"sc": "string",
"f": "string",
"uc": "int"
},
{
"t": "2010-05-23",
"c": "string",
"sc": "string",
"f": "string",
"uc": "int"
},
{
"time": "2010-05-23",
"c": "string",
"sc": "string",
"f": "string",
"uc": "int"
}
]
}
}
My end goal is to get the dataframe out to a csv in order to be ingested. I appreciate everyone's help looking at this.
using python 3.8.10 & pandas 1.3.4
python code below
import csv
import glob
import json
import os
import pandas as pd
tempdir = '/dir/to/files/json_temp'
json_files = os.path.join(tempdir, '*.json')
file_list = glob.glob(json_files)
dfs = []
for file in file_list:
with open(file) as f:
data = pd.json_normalize(json.loads(f.read()))
dfs.append(data)
df = pd.concat(dfs, ignore_index=True)
df.explode('fusage')
print(df)
If you're going to use the explode function, after that, apply pd.Series
over the column containing the fusage
list (beat.fusage
) to obtain a Series for each list item.
/dir/to/files
├── example-v1.4.6-2.json
└── example-v2.2.2-2.json
...
for file in file_list:
with open(file) as f:
data = pd.json_normalize(json.loads(f.read()))
dfs.append(data)
df = pd.concat(dfs, ignore_index=True)
fusage_list = df.explode('beat.fusage')['beat.fusage'].apply(pd.Series)
df = pd.concat([df, fusage_list], axis=1)
# show desired columns
df = df[['did', 'beat.Version', 'beat.Nodes', 'beat.rds', 'time', 'c', 'sc', 'f', 'uc']]
print(df)
Output from df
did beat.Version beat.Nodes beat.rds time c sc f uc
0 123456789 v1.4.6-2 6 0.00B 2010-05-25 string string string int
0 123456789 v1.4.6-2 6 0.00B 2010-05-19 string string string int
0 123456789 v1.4.6-2 6 0.00B NaN string string string int
0 123456789 v1.4.6-2 6 0.00B 2010-05-23 string string string int
1 123777777 v2.2.2-2 4 0.00B 2010-05-25 string string string int
1 123777777 v2.2.2-2 4 0.00B 2010-05-19 string string string int
1 123777777 v2.2.2-2 4 0.00B NaN string string string int
1 123777777 v2.2.2-2 4 0.00B 2010-05-23 string string string int