Search code examples
pythongoogle-bigquery

Python dictionary with nested dict to bigquery


Assume my Biguery table has of JSON type.

I am trying to insert a dictionary to my table using python bq client. I am able to insert the data using different functions like load_table_from_json. I have specified column as JSON type in job config schema of the big query table.

My dictionary in python - {"A": 2, "B": 1, "C": 4, "D": 3}

The result in table -

"

{\n\"A\":2,\n\"B\":1,\n\"C\":4,\n\"D\":3\n}"

I don't want the delimiters to appear. I get why they are appearing as bigquery converts the dictionary to a type it can understand resulting the the above structure when entering into the table.

I Have a manual workaround if i directly write the insert command and use the bq_client.query() function. I can make user of parse_json function , this helps in getting the desired result of {"A": 2, "B": 1, "C": 4, "D": 3}.

But I couldn't find any python BQ library way of achieving the same. Is there a way to do this?


Solution

  • As Barmar mentioned, it seems like you're serializing the JSON twice. Here's why:

    If you've already set your BigQuery column as a JSON type, there's no need to use json.dumps(). BigQuery is already expecting the data in JSON format. When you use json.dumps(), you're serializing the dictionary into a JSON string, and then when you insert it into BigQuery, it serializes it again, causing the escaped characters.

    Instead, you should pass the dictionary directly to BigQuery, like this:

    data = [{"your_json_column": data}]
    
    client.load_table_from_json(data, 'project.dataset.table')