Search code examples
google-analyticsgoogle-bigquerygoogle-analytics-4data-collection

Streaming Google Analytics 4 data to BigQuery causing data collection issues


We have configured a linking between the GA 4 property and GoogleBigQuery via the GA interface (without any additional code). It works fine, we see a migrated data in GBQ tables, but however, we face an issue with how this data is written in those tables.

If we look at any table we could see that events from different users can be recorded in one session (and there can be different clientIDs (and even usedIDs, which we pass when authorizing a user)) See an example

This is a result of executing following query:

SELECT
    event_name,
    user_pseudo_id,
    user_id,
    device.category,
    device.mobile_brand_name,
    device.mobile_model_name,
    device.operating_system_version,
    geo.region,
    geo.city,
    params.key,
    params.value.int_value
FROM `%project_name%.analytics_256374149.events_20210331`, unnest(event_params) AS params
WHERE event_name="page_view"
AND params.value.int_value=1617218965
ORDER BY event_timestamp 

As a result, you can see that within one session different users from different regions, with different devices and identifiers are combined. It is, of course, impossible to use such data for reporting purposes. Once again, it is a default GA4 → BigQuery setup in the GA4 interface (no add-ons).

We do not understand what the error is (in import, in requests, or somewhere else) and would like to get advice on this issue.

Thanks.


Solution

  • You should look at the combination of user_pseudo_id and the event_param ga_session_id. This combination is unique and used for measuring unique sessions across a property.

    For example, this query counts the number of unique event names in each session:

    SELECT
      user_pseudo_id,
      (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
      COUNT(DISTINCT event_name) AS unique_event_name_count
    FROM `<project>.<dataset>.events_*`
    GROUP BY user_pseudo_id, ga_session_id