Search code examples
sqlgoogle-bigquerysql-updatedata-partitioning

UPDATE statement in BigQuery that sets _PARTITIONDATE equal particular date field in your table


I have a table in BigQuery with a date column called dt where all data for all dates is currently in the partition DATE(_PARTITIONTIME)="2022-01-20". I want to run an UPDATE statement that makes dt=DATE(_PARTITIONTIME) for all partitions. For example, where DATE(_PARTITIONTIME)="2022-01-19" there will only be data where dt="2022-01-19", where DATE(_PARTITIONTIME)="2022-01-18" there will only be data where dt="2022-01-18" and so on.

This is something that I have done in the past so I know is possible, I just can't remember exactly what the statement was.


Solution

  • Creating a partitioned table

    Since you don't need a table partition by ingestion time, you can create your table using your own date field as the partition field. You can do so by adding the "PARTITON BY" statement when creating a table, like this

    CREATE TABLE `project_id.mydataset.mytable` (
      field1 STRING,
      dt TIMESTAMP
    )
    PARTITION BY DATE(dt)
    

    or

    CREATE TABLE `project_id.mydataset.mytable`
    PARTITION BY DATE(dt)
    AS (
      SELECT * FROM `project_id.mydataset.othertable`
    )
    
    

    Updating the _PARTITIONTIME

    Addressing your original question, if you need you can also update the _PARTITIONTIME field. To set all _PARTITIONTIME fields equal to your dt column, you can do the following:

    UPDATE
      project_id.dataset.mytable
    SET
      _PARTITIONTIME = dt
    WHERE
      1=1
    

    If dt has a different granularity than _PARTITIONTIME (_PARTITIONTIME granularity is day and dt is hour, for example), than you can do a TIMESTAMP_TRUNC

    UPDATE
      project_id.dataset.mytable
    SET
      _PARTITIONTIME = TIMESTAMP_TRUNCT(dt, DAY)
    WHERE
      1=1