Search code examples
google-bigqueryapache-beam-io

Set table description when creating table with apache_beam.io.gcp.bigquery.WriteToBigQuery


Is it possible to create a table with a provided description string (for the table) using Apache Beam's WriteToBigQuery?

The additional_bq_parameters argument is useful to set, for example, the clustering or partitioning fields, but I cannot find a way to set table description here, nor in the schema object that is passed.

Is there any alternative way to do this (create a table with description / set table description) using the native Beam functions?


Solution

  • Setting table descriptions at the time of table creation is not directly supported by the apache_beam.io.gcp.bigquery.WriteToBigQuery transform. There isn't a parameter for specifying a description, however the schema parameter lets you specify the table schema. Setting a table description requires the following steps:

    1. construct the table independently: Use the BigQuery API or the bq command-line tool to construct the BigQuery table prior to executing your Beam pipeline. This enables you to include a description when creating the table. This guarantees that the table is there before the Beam pipeline tries to write data. For more details refer to this documentation .

    2. Utilize WriteToBigQuery with CREATE_NEVER: In your Beam pipeline, utilize WriteToBigQuery with beam.io.BigQueryDisposition.CREATE_NEVER as the create_disposition argument. As a result, Beam will just publish data to the existing table rather than trying to create the table itself and refer to link1 and link2.