Search code examples
pythongoogle-bigqueryddl

How to exclude a column when creating BigQuery external table?


I'm trying to create an external table in BQ using data stored in GCS bucket. Below is the DDL command I'm using:

CREATE OR REPLACE EXTERNAL TABLE `external table`
OPTIONS (
  format = 'parquet',
  uris = ['gs://...', 'gs://...']
);

How can I exclude a particular column from being imported to external table? Since I cannot ALTER an external table to DROP COLUMN after being created.


Solution

  • There is no provision for selecting columns while loading data from GCS.

    In the following documentation provided by google all possible configurations and properties for loading data to bigQuery from GCS is provided. But the option that you are looking for is not there. https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv

    But there are other ways to do this.

    First option,

    You can load this to data to a temp table first, and then select the required columns from that and populate the second table.

    Second option,

    If you don't want to do this manually. You can make use of CloudRun on BigQuery events. Whenever you load data into first table, it triggers Cloudrun in which you can write your code to remove the column you do not want and insert those into the second table.

    https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events

    Third Option,

    If this is just a one time activity, you can load the whole data into one table and then create a view with required columns on top of it.