Search code examples
jsongoogle-apps-scriptgoogle-bigquerygoogle-apps-script-addonjsonlines

Google Apps Script - How to stream JSON data into BigQuery?


In this reference https://developers.google.com/apps-script/advanced/bigquery,

In order to load CSV data into BigQuery, they use:

var file = DriveApp.getFileById(csvFileId);
  var data = file.getBlob().setContentType('application/octet-stream');

  // Create the data upload job.
  var job = {
    configuration: {
      load: {
        destinationTable: {
          projectId: projectId,
          datasetId: datasetId,
          tableId: tableId
        },
        skipLeadingRows: 1
      }
    }
  };
  job = BigQuery.Jobs.insert(job, projectId, data);

As I understand, they send a blob to BigQuery file.getBlob().setContentType('application/octet-stream');, which is not friendly

How to send a JSON to BigQuery in Apps Script?

With the library @google-cloud/bigquery (using in a project outside of Apps Script), I can do something like this:

https://cloud.google.com/bigquery/streaming-data-into-bigquery#streaminginsertexamples

// Import the Google Cloud client library
const { BigQuery } = require('@google-cloud/bigquery')
const moment = require('moment')

exports.insertUsageLog = async (userId) => {
  const datasetId = 'usage'
  const tableId = 'logs'
  const rows = [
    // The JSON data is collected here
    {
      timestamp: moment.utc().toISOString(),
      userId,
      // Something else ...
    },
  ]

  // Create a client
  const bigqueryClient = new BigQuery()

  // Insert data into a table
  await bigqueryClient
    .dataset(datasetId)
    .table(tableId)
    .insert(rows)
  console.log(`Inserted ${rows.length} rows`)
}

Solution

  • The data payload to BigQuery.Jobs.insert() has to be a blob.

    You can create that blob from either CSV content or a newline delimited JSON. Newline delimited JSON is a distinct form of JSON that is required by BigQuery. It is not natively supported by Apps Script. However, you should be able to convert standard JSON to that format by creating a custom replacer function and passing it as a parameter to JSON.stringify(). Alternatively, you might be able to leverage an existing Javascript library (you might be able to find something via NPM or just doing a search on Github).

    Once you've generated a newline delimited JSON (as either a string or a byte array) you'll need to convert it to a blob using Utilities.newBlob() and pass it to the BigQuery.Jobs.insert() method.