Search code examples
google-bigquerygoogle-analyticsgoogle-analytics-4

How to filter by multiple event_params in BQ for GA4 events


I have GA4 data in BigQuery and I am aware I can use unnest to select values from the event_params. However, I dont know how to filter based on multiple values (e.g. dimensions) at the same time. For instance I have two dimensions called page_path and previous_page_path which I can select individually by

SELECT 
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_path') AS page_path,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'previous_page_path') AS previous_page_path,
FROM `project.dataset.events_*`

But when I try to add a Where now to do additional filtering the alias names do not get resolved and BigQuery returns an error. The following for example does not work:

SELECT 
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_path') AS page_path,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'previous_page_path') AS previous_page_path,
FROM `project.dataset.events_*`
WHERE page_path = "/"

How can this be done?

In the end I want to be able to filter for multiple event_params. I am pretty sure it requires multiple unnesting steps since key can not have two values at the same time but I do not know as to how exactly the query needs to look like to do this.


Solution

  • I've used GA4 data in BigQuery public dataset to demonstrate the query. Assuming you want to filter the data based on the values of 2 dimensions:

    • key : page_title, value : Home
    • key : session_engaged, value : 1

    you can consider below.

    SELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') AS page_title,
           (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS session_engaged,
      FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
     WHERE (SELECT COUNT(1) FROM UNNEST(event_params) e 
             WHERE (key, value.string_value) IN (('page_title', 'Home'), ('session_engaged', '1'))
           ) = 2;
    
    -- Query results
    +------------+-----------------+
    | page_title | session_engaged |
    +------------+-----------------+
    | Home       |               1 |
    | Home       |               1 |
    | Home       |               1 |
    +------------+-----------------+
    
    • Where clause returns true only when found both dimensions at the same time. (I assumed that each dimension isn't duplicate in the event_params in a row)