Search code examples
google-bigquery

Loading data to ingestion time partitioned table using Write_Truncate overwrites all partitions


I have a table partitioned in daily ingestion time. My intention is to overwrite an specific partition. But instead of overwriting one, is overwriting all of the partitions and asigning them to the current ingestion time. I'm still new to BigQuery and the only solution that I think is using ´merge´ but feels like too much extra steps.

I'm following the documentation, the format to select an specific partition should be:

{projectID}.{dataset}.{table_name}${part_date}

I have also checked this stackoverflow question here.

For example I have a table with one partition on 2024-10-02. When I load the same data today, it will overwrite although it should just create another partition for 2024-10-03.

My code is looking like this:

       part_date = '$20241003'
       job_config = bigquery.LoadJobConfig(
            write_disposition=options.write_disposition,
            schema= options.schema,
            time_partitioning=bigquery.table.TimePartitioning(type_=options.partition_type)
        )
        
        table = bigquery.Table(f"{self._client.project}.{options.dataset}.{options.table_name}{part_date}")
        
        table.require_partition_filter = options.require_partition_filter

        self._client.load_table_from_dataframe(
            options.dataframe,
            table,
            job_config=job_config,
        ).result() 

Solution

  • Your code is explicitly specifying the partition date in the table object using {self._client.project}.{options.dataset}.{options.table_name}{part_date}. This tells BigQuery to target that specific partition for the load operation, leading to the overwrite behavior.

    Inside your job_config, try to define your destination table as part_date.

    part_date = '20241003'  
    job_config = bigquery.LoadJobConfig(
        write_disposition=options.write_disposition, 
        schema=options.schema,
     time_partitioning=bigquery.table.TimePartitioning(type_=options.partition_type),
        destinationTable=part_date 
    )
    
    

    Remove the partition date in your table definition and see how it goes:

    table = bigquery.Table(f"{self._client.project}.{options.dataset}.{options.table_name}")