I have a dbt cloud project which queries some BigQuery data, applies some transformations and returns some datamarts. The source data is a collection of sharded tables i.e. GA4 events tables
I want to test for source freshness. aka return a warning if the new events table is older than 24h.
This is how I meant to go about it: Check the name of the new events table i.e. events_20231123, parse the suffix (20231123) as a timestamp, and check if that timestamp is older than 24h.
version: 2
sources:
- name: google_analytics
description: "Google Analytics 4 export"
database: my_project_name
schema: analytics_00000000 #the name of the BQ dataset
tables:
- name: events
identifier: events_*
loaded_at_field: parse_timestamp(_table_suffix, '%Y%m%d')
freshness:
warn_after:
count: 24
period: hour
error_after:
count: 36
period: hour
Which returns the following error: 07:00:43 1 of 1 ERROR freshness of google_analytics.events .............................. [ERROR in 3.25s]
07:00:43
07:00:43
07:00:43 Database Error in source events (models/_source/_sources.yml)
Mismatch between format character '2' and string character '%' at string index: 0 with format: '20231120'
07:00:43 Done.
How may I go about this? Should I just come up with a timestamp column in my sharded tables and test the refresh on this column? Why isn't my regex working?
thank you for your time.
Two things:
parse_timestamp()
function, you first need to specify the format, and then the value. E.g. parse_timestamp('%Y%m%d', _table_suffix)
. See docs_table_suffix
at 00:00:00. Why don't you use the event_timestamp
from each record instead? Not perfect, but at least you will exactly know from which timestamp is each record.