Search code examples
pythonjsonexportexport-to-csv

Convert JSON objects to table structure in Python


I have some JSON with an 'items list. I want to convert it to semicolon separated file with headers

I want to produce lines in this format: id, date, order[x][0], order[x][1], ...

My issue is that I need to include headers, the names of the fields, and some order items do not have all the possible elements so I can't just join the values together.

for the JSON below I'd expect the following output:

id;date;prefix;number;quantity;code;index
06107;2023-09-25T01:51:04Z;VO;32233809;1;;4
06107;2023-09-25T01:51:04Z;VO;31438125;1;;4
06107;2023-09-25T10:00:51Z;VO;31407983;1;14;4
06107;2023-09-25T10:00:51Z;VO;986116;6;12;4

I'm guessing maybe some kind of panda usage would be the solution, but I'm not sure how to make the mapping and export.

        "items": [
            {
                "id": "06107",
                "date": "2023-09-25T01:51:04Z",
                "order": [
                    {
                        "prefix": "VO",
                        "number": "32233809",
                        "quantity": 1,
                        "index": 4
                    },
                    {
                        "prefix": "VO",
                        "number": "31438125",
                        "quantity": 1,
                        "index": 4
                    }
                ]
            },
            {
                "id": "06107",
                "date": "2023-09-25T10:00:51Z",
                "order": [
                    {
                        "prefix": "VO",
                        "number": "31407983",
                        "quantity": 1,
                        "code": 14,
                        "index": 4
                    },
                    {
                        "prefix": "VO",
                        "number": "986116",
                        "quantity": 6,
                        "code": 12,
                        "index": 4
                    }
                ]
            }
        ]

Solution

  • I think the easiest way to proceed is with a csv.DictWriter() passing extrasaction="ignore".

    Given:

    data = [
        {
            "id": "06107",
            "date": "2023-09-25T01:51:04Z",
            "order": [
                {"prefix": "VO", "number": "32233809", "quantity": 1, "index": 4},
                {"prefix": "VO", "number": "31438125", "quantity": 1, "index": 4}
            ]
        },
        {
            "id": "06107",
            "date": "2023-09-25T10:00:51Z",
            "order": [
                {"prefix": "VO", "number": "31407983", "quantity": 1, "code": 14, "index": 4},
                {"prefix": "VO", "number": "986116", "quantity": 6, "code": 12, "index": 4}
            ]
        }
    ]
    

    Then this code:

    import csv
    
    headers = ["id", "date", "prefix", "number", "quantity", "code", "index"]
    all_items = [
        {**item, **detail}
        for item in data
        for detail in item["order"]    
    ]
    
    with open("foo.csv", "w", newline="") as file_out:
        writer = csv.DictWriter(file_out, delimiter=";", fieldnames=headers, extrasaction="ignore")
        writer.writeheader()
        writer.writerows(all_items)
    

    Will give you a file:

    id;date;prefix;number;quantity;code;index
    06107;2023-09-25T01:51:04Z;VO;32233809;1;;4
    06107;2023-09-25T01:51:04Z;VO;31438125;1;;4
    06107;2023-09-25T10:00:51Z;VO;31407983;1;14;4
    06107;2023-09-25T10:00:51Z;VO;986116;6;12;4
    

    A question was asked about how to proceed if the child attributes were unknown. A slight change to the structure of the program will allow us to accumulate the headers in a set.

    headers = set(["id", "date"])
    all_items = []
    for item in data:
        for detail in item["order"]:
            headers.update(detail.keys())
            all_items.append({**item, **detail})