Search code examples
google-apiexportgoogle-bigquerygoogle-cloud-storage

BigQuery - Check if table already exists


I have a dataset in BigQuery. This dataset contains multiple tables.

I am doing the following steps programmatically using the BigQuery API:

  1. Querying the tables in the dataset - Since my response is too large, I am enabling allowLargeResults parameter and diverting my response to a destination table.

  2. I am then exporting the data from the destination table to a GCS bucket.

Requirements:

  • Suppose my process fails at Step 2, I would like to re-run this step.

  • But before I re-run, I would like to check/verify that the specific destination table named 'xyz' already exists in the dataset.

  • If it exists, I would like to re-run step 2.

  • If it does not exist, I would like to do foo.

How can I do this?

Thanks in advance.


Solution

  • Here is a python snippet that will tell whether a table exists (deleting it in the process--careful!):

    def doesTableExist(project_id, dataset_id, table_id):
      bq.tables().delete(
          projectId=project_id, 
          datasetId=dataset_id,
          tableId=table_id).execute()
      return False
    

    Alternately, if you'd prefer not deleting the table in the process, you could try:

    def doesTableExist(project_id, dataset_id, table_id):
      try:
        bq.tables().get(
            projectId=project_id, 
            datasetId=dataset_id,
            tableId=table_id).execute()
        return True
      except HttpError, err
        if err.resp.status <> 404:
           raise
        return False
    

    If you want to know where bq came from, you can call build_bq_client from here: http://code.google.com/p/bigquery-e2e/source/browse/samples/ch12/auth.py

    In general, if you're using this to test whether you should run a job that will modify the table, it can be a good idea to just do the job anyway, and use WRITE_TRUNCATE as a write disposition.

    Another approach can be to create a predictable job id, and retry the job with that id. If the job already exists, the job already ran (you might want to double check to make sure the job didn't fail, however).