I'm trying to process a json file for use in another program that will use an excel file. My json file has multiple sections/arrays, one for some basic info like number of records, report name. It has sections/arrays for column names and then another that has each individual row/record as an array.
I've tried using pandas.read_json and the json_loads options to read the data in but I continually get errors. If I remove all sections except just one (like the rows) section, I can get it to read that in (although it puts all columns in one column like it's not identifying each column name separated by a comma as a separate column.
Ideally I'd like to not need to manually edit this file and just process it in python into a pandas dataframe so I can further manipulate it and export it for other use.
Any help would be greatly appreciated. This is my first post so if there's anything I can do better let me know!
Here's a representation of the json data, the actual data has several more columns and many more rows/records (700+ usually)
{
"count": 2,
"name": "report1",
"columnNames": [
"Incident ID",
"Status",
"Subject"
],
"rows": [
[
"2460636",
"Resolved",
"login help"
],
[
"2460637",
"Unresolved",
"email help"
]
]
}
I'm trying to get the columnNames section to become the column names in a pandas dataframe and each "row" to be a record in the dataframe.
I've tried looking at other examples but I haven't come across a similar issue with json formatted like this.
I've tried using pandas.read_json("example.json") as well as the json.loads to load the data to get it in but they all come up with different errors that I can't seem to get around.
When Running the pandas.read_json("example.json") it comes back saying "arrays must all be same length".
Result should be the columnNames section/array should be the column names of the pandas dataframe and then each "row" I want to be a record in the dataframe.
pd.json_normalize
: to unpack your json
fileimport pandas as pd
import json
with open('test.json') as f:
data = json.load(f)
json_data = pd.json_normalize(data)
Output:
columnNames count name rows
0 [Incident ID, Status, Subject] 2 report1 [[2460636, Resolved, login help], [2460637, Un...
Unpack rows
:
df_rows = pd.json_normalize(data, record_path=['rows'], meta=['name'])
df_rows.rename({0: data['columnNames'][0],
1: data['columnNames'][1],
2: data['columnNames'][2]}, axis=1, inplace=True)
Output of df_row
:
Incident ID Status Subject name
0 2460636 Resolved login help report1
1 2460637 Unresolved email help report1
The json
isn't particularly well formed, something like the following would be easier to unpack:
{
"count": 2,
"name": "report1",
"rows": [{
"Incident ID": "2460636",
"Status": "Resolved",
"Subject": "login help"
}, {
"Incident ID": "2460637",
"Status": "Unresolved",
"Subject": "email help"
}
]
}