Search code examples
pythongoogle-bigquery

How to specify the default project to use with the BigQuery python client?


Writing a python cloud functions such as

from google.cloud.bigquery import Client

client = Client(
    project="my_gcloud_project",
    location="my_zone",
    )
sql_statement = "SELECT * FROM `my_dataset.my_table`"
query_job = client.query(sql_statement)
query_job.result()

I get a

ValueError: When default_project is not set, table_id must be a fully-qualified ID in standard SQL format, e.g., "project.dataset_id.table_id"

I understand that changing the query to SELECT * FROM `my_project.my_dataset.my_table` solves the error, but is there a way to tell the BigQuery client what is the default project so we do not have to specify it in every query ?


Solution

  • At the client level, it looks like you can specify a default dataset by passing in a QueryJobConfig which would require the project to be qualified:

    job_config = bigquery.QueryJobConfig(
        default_dataset= "your_project.your_dataset"
    )
    
    client = Client(
        project="your_gcloud_project",
        location="your_zone",
        default_query_job_config=job_config
        )
    
    sql_statement = "SELECT * FROM `your_table`"
    
    query_job = client.query(sql_statement)
    query_job.result()
    
    

    UPDATE:

    As Noan Cloarec pointed out, specifying the default dataset allows BigQuery to infer the project for other datasets. For example, the following will work even if the default dataset does not exist:

    from google.cloud.bigquery import Client, QueryJobConfig
    
    job_config = QueryJobConfig(
       default_dataset="your_gcloud_project.dataset_that_does_not_exists"
    )
    
    client = Client(
        project="your_gcloud_project", 
        location="your_zone",
        default_query_job_config=job_config
    )
    
    sql_statement = """
    SELECT * FROM 
    `dataset_that_exists.table_that_exists`
    """
    
    query_job = client.query(sql_statement)
    query_job.result()
    
    

    I believe that the default_project reference in the error that you are seeing is coming from an internal TableReference where in some methods you can specify a default_project. Possibly resulting from an error trace.