Search code examples
pythonjsongoogle-bigquerygoogle-apigoogle-cloud-storage

Write a new key to every line in a NDJSON file


I'm trying to format the data to feed into an API. The requested body asking for

{ userEvent: { eventType: "home-page-view", visitorId: "visitor-1", userInfo: { userId: "user-1" } } }
{ userEvent: { eventType: "home-page-view", visitorId: "visitor-2", userInfo: { userId: "user-2" } } }
{ userEvent: { eventType: "home-page-view", visitorId: "visitor-3", userInfo: { userId: "user-3" } } }
...

I got a query to get the required columns in BigQuery and then push the table result to GCS in the NDJSON format. However, I have no clue how to add a key userEvent to include the currently available results (1) during the query or (2) after storing the NDJSON file in GCS. Given that the expected number of rows will be roughly one million, I wonder what the most efficient solution to this as I'd like to avoid a for loop to go through each line...

This is what I have now:

# get the BigQuery result and store in the "results" variable
query_job = bq_client.query(query, job_config=job_config, location="US")
results = query_job.result() #<google.cloud.bigquery.table.RowIterator at 0x148728310>

# The current NDJSON file in GCS
{"eventType":"home-page-view","visitorId":"13245","userInfo":{"userId":"11111"}
{"eventType":"home-page-view","visitorId":"56789","userInfo":{"userId":"22222"}

Solution

  • As commented by @Anita, you can refer to the below sample code mentioned in this stack link to add the key:value pair.

    import json
    
    with open(json_file) as json_file:
        json_decoded = json.load(json_file)
    
    json_decoded['KEY'] = 'VALUE'
    
    with open(json_file, 'w') as json_file:
        json.dump(json_decoded, json_file)
    
    

    Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

    Feel free to edit this answer for additional information.