I have some JSON with an 'items list. I want to convert it to semicolon separated file with headers
I want to produce lines in this format: id, date, order[x][0], order[x][1], ...
My issue is that I need to include headers, the names of the fields, and some order
items do not have all the possible elements so I can't just join the values together.
for the JSON below I'd expect the following output:
id;date;prefix;number;quantity;code;index
06107;2023-09-25T01:51:04Z;VO;32233809;1;;4
06107;2023-09-25T01:51:04Z;VO;31438125;1;;4
06107;2023-09-25T10:00:51Z;VO;31407983;1;14;4
06107;2023-09-25T10:00:51Z;VO;986116;6;12;4
I'm guessing maybe some kind of panda usage would be the solution, but I'm not sure how to make the mapping and export.
"items": [
{
"id": "06107",
"date": "2023-09-25T01:51:04Z",
"order": [
{
"prefix": "VO",
"number": "32233809",
"quantity": 1,
"index": 4
},
{
"prefix": "VO",
"number": "31438125",
"quantity": 1,
"index": 4
}
]
},
{
"id": "06107",
"date": "2023-09-25T10:00:51Z",
"order": [
{
"prefix": "VO",
"number": "31407983",
"quantity": 1,
"code": 14,
"index": 4
},
{
"prefix": "VO",
"number": "986116",
"quantity": 6,
"code": 12,
"index": 4
}
]
}
]
I think the easiest way to proceed is with a csv.DictWriter()
passing extrasaction="ignore"
.
Given:
data = [
{
"id": "06107",
"date": "2023-09-25T01:51:04Z",
"order": [
{"prefix": "VO", "number": "32233809", "quantity": 1, "index": 4},
{"prefix": "VO", "number": "31438125", "quantity": 1, "index": 4}
]
},
{
"id": "06107",
"date": "2023-09-25T10:00:51Z",
"order": [
{"prefix": "VO", "number": "31407983", "quantity": 1, "code": 14, "index": 4},
{"prefix": "VO", "number": "986116", "quantity": 6, "code": 12, "index": 4}
]
}
]
Then this code:
import csv
headers = ["id", "date", "prefix", "number", "quantity", "code", "index"]
all_items = [
{**item, **detail}
for item in data
for detail in item["order"]
]
with open("foo.csv", "w", newline="") as file_out:
writer = csv.DictWriter(file_out, delimiter=";", fieldnames=headers, extrasaction="ignore")
writer.writeheader()
writer.writerows(all_items)
Will give you a file:
id;date;prefix;number;quantity;code;index
06107;2023-09-25T01:51:04Z;VO;32233809;1;;4
06107;2023-09-25T01:51:04Z;VO;31438125;1;;4
06107;2023-09-25T10:00:51Z;VO;31407983;1;14;4
06107;2023-09-25T10:00:51Z;VO;986116;6;12;4
A question was asked about how to proceed if the child attributes were unknown. A slight change to the structure of the program will allow us to accumulate the headers in a set.
headers = set(["id", "date"])
all_items = []
for item in data:
for detail in item["order"]:
headers.update(detail.keys())
all_items.append({**item, **detail})