Search code examples
google-bigquerygoogle-analyticsdbt

Schedule update of Google Analytics dataset on BigQuery


I am aware that Google Analytics can be linked to Bigquery using BigQuery Linking features in the GA.enter image description here

But I experienced the drawback that it's scheduled at a random time. So, it's messed up my table with dependencies to these GA data, which I set up at 9 AM using DBT -- so if the GA data is updated above 9 AM, my table won't have today's GA data. enter image description here enter image description here enter image description here

My questions are:

  1. Is there a way to schedule the updated GA data to have constant time, as the cronjob did?
  2. Or if there is not any. Is there a way for DBT to run the job after the GA data is updated on bigquery?

Solution

  • Unfortunately Google provide no SLA on the BigQuery export from Google Analytics 3, if you have the option the best solution would be to migrate to Google Analytics 4, which was an almost realtime export to BigQuery and appears to be much more robust. Find out more on the official Google support page.

    I currently get around this by using event based triggers that look at the meta data of a table, or check for the existence of a sharded table for yesterday, then proceed down downstream jobs, I'm sure you could achieve something similar with DBT.

    Here is some example SQL code which checks for the existence of yesterday's Google Analytics sharded table by returning the maximum timestamp:

    SELECT MAX(cast(PARSE_DATE('%Y%m%d', SUBSTR(table_id,13)) as timestamp)) as max_date
    FROM `my_ga_dataset.__TABLES__` 
    WHERE table_id LIKE'%ga_sessions_%'
    AND table_id NOT LIKE '%intraday%'
    AND PARSE_DATE('%Y%m%d', SUBSTR(table_id,13))  >= CURRENT_DATE() -9
    

    This works for sharded tables, if you want to use table metadata to get the date/time of the last table update you can use INFORMATION_SCHEMA: https://cloud.google.com/bigquery/docs/information-schema-tables