Search code examples
jsonpandasexport-to-csvjq

Json to CSV issues


I am using pandas to normalize some json data. I am getting stuck on this issue when more than 1 section is either an object or an array.

If i use the record_path on Car it breaks on the second.

Any pointers on how to get something like this to create a line in the csv per Car and per Location?

[
    {
        "Name": "John Doe",
        "Car": [
            "Car1",
            "Car2"
        ],
        "Location": "Texas"
    },
    {
        "Name": "Jane Roe",
        "Car": "Car1",            
        "Location": [
            "Illinois",
            "Kansas"
        ]
    }
]

Here is the output

Name,Car,Location
John Doe,"['Car1', 'Car2']",Texas
Jane Roe,Car1,"['Illinois', 'Kansas']"

Here is the code:

with open('file.json') as data_file:
    data = json.load(data_file)
df = pd.io.json.json_normalize(data, errors='ignore')

Would like it to end up like this:

Name,Car,Location
John Doe,Car1,Texas
John Doe,Car2,Texas
Jane Roe,Car1,Illinois
Jane Roe,Car1,Kansas

The answers worked great until I ran into a json file with extra data. This what a file looks like with the extra values.

{
    Customers:[
    {
        "Name": "John Doe",
        "Car": [
            "Car1",
            "Car2"
        ],
        "Location": "Texas",
        "Repairs: {
            "RepairLocations": {
                "RepairsCompleted":[
                    "Fix1",
                    "Fix2"
                ]
            }
        }
    },
    {
        "Name": "Jane Roe",
        "Car": "Car1",            
        "Location": [
            "Illinois",
            "Kansas"
        ]
    }
]
}

Here is what I am going for. I think its the most readable in this format but anything would at least should all the keys

Name,Car,Location,Repairs:RepairLocation
John Doe,Car1,Texas,RepairsCompleted:Fix1
John Doe,Car1,Texas,RepairsCompleted:Fix2
John Doe,Car2,Texas,RepairsCompleted:Fix1
John Doe,Car2,Texas,RepairsCompleted:Fix2
Jane Roe,Car1,Illinois,
Jane Roe,Car1,Kansas,

Any suggestions on getting this second part?


Solution

  • You're looking for something like this:

    def expand($keys):
        . as $in
        | reduce $keys[] as $k ( [{}];
            map(. + { 
                ($k): ($in[$k] | if type == "array" then .[] else . end)
            })
        ) | .[];
    (.[0] | keys_unsorted) as $h
    | $h, (.[] | expand($h) | [.[$h[]]]) | @csv
    

    REPL demo