Search code examples
google-bigqueryspring-cloud-gcp-bigquery

Is it possible to set expiration time for records in BigQuery


Is it possible to set a time to live for a column in BigQuery? If there are two records in table payment_details and timestamp, the data in BigQuery table should be deleted automatically if the timestamp is current time - timestamp is greater is 90 days.


Solution

  • Solution 1:

    BigQuery has a partition expiration feature. You can leverage that for your use case.

    Essentially you need to create a partitioned table, and set the partition_expiration_days option to 90 days.

    CREATE TABLE
      mydataset.newtable (transaction_id INT64, transaction_date DATE)
    PARTITION BY
      transaction_date
    OPTIONS(
      partition_expiration_days=90
    )
    

    or if you have a table partitioned already by the right column

    ALTER TABLE mydataset.mytable
     SET OPTIONS (
       -- Sets partition expiration to 90 days
       partition_expiration_days=90
     )
    

    When a partition expires, BigQuery deletes the data in that partition.

    Solution 2:

    You can setup a Scheduled Query that will prune hourly/daily your data that is older than 90 days. By writing a "Delete" query you have more control to actually combine other business logic, like only delete duplicate rows, but keep most recent entry even if it's older than 90 days.

    Solution 3:

    If you have larger business process that does the 90 day pruning based on other external factors, like an API response, and conditional evaluation, you can leverage Cloud Workflows to build and invoke a workflow regularly to automate the pruning of your data. See Automate the execution of BigQuery queries with Cloud Workflows article which can guide you with this.