Search code examples
pythonjsongoogle-cloud-platformgoogle-bigquery

Converting JSON into newline delimited JSON in Python


My goal is to convert JSON file into a format that can uploaded from Cloud Storage into BigQuery (as described here) with Python.

I have tried using newlineJSON package for the conversion but receives the following error.

JSONDecodeError: Expecting value or ']': line 2 column 1 (char 5)

Does anyone have the solution to this?

Here is the sample JSON code:

[{
    "key01": "value01",
    "key02": "value02",
    ...
    "keyN": "valueN"
},
{
    "key01": "value01",
    "key02": "value02",
    ...
    "keyN": "valueN"
},
{
    "key01": "value01",
    "key02": "value02",
    ...
    "keyN": "valueN"
}
]

And here's the existing python script:

with nlj.open(url_samplejson, json_lib = "simplejson") as src_:
    with nlj.open(url_convertedjson, "w") as dst_:
        for line_ in src_:
            dst_.write(line_)

Solution

  • The answer with jq is really useful, but if you still want to do it with Python (as it seems from the question), you can do it with built-in json module.

    import json
    from io import StringIO
    in_json = StringIO("""[{
        "key01": "value01",
        "key02": "value02",
    
        "keyN": "valueN"
    },
    {
        "key01": "value01",
        "key02": "value02",
    
        "keyN": "valueN"
    },
    {
        "key01": "value01",
        "key02": "value02",
    
        "keyN": "valueN"
    }
    ]""")
    
    result = [json.dumps(record) for record in json.load(in_json)]  # the only significant line to convert the JSON to the desired format
    
    print('\n'.join(result))
    
    {"key01": "value01", "key02": "value02", "keyN": "valueN"}
    {"key01": "value01", "key02": "value02", "keyN": "valueN"}
    {"key01": "value01", "key02": "value02", "keyN": "valueN"}
    

    * I'm using StringIO and print here just to make a sample easier to test locally.

    As an alternative, you can use Python jq binding to combine it with the other answer.