Search code examples
google-bigquerygoogle-analyticsgoogle-analytics-4

Google Analytics / Google Tag Manager: What are reliable and/or canonical methods of consolidating a partitioned data stream for analysis?


I have about 50 GA4 properties, each belonging to the same GA4 account. Each property has its own data stream with a unique Measurement ID. These properties are measuring various directories of the same domain, for example:
www.example.com/properties/a/, www.example.com/properties/b/, ..., www.example.com/properties/z/.
I don't want to get into why its architect-ed this way. Just know that the partitioning was made necessary by a vendor that tracks charges for data in a manner facilitated by this separation.

I now want to analyze total site traffic for trends. Some early ideas:

  1. Create a master GA4 property with a web stream for each property. GA4 limits a property to 50 streams total, so this isn't a good option for scale- we're practically at the limit already.
  2. Create a master GA4 property with its own data stream and install this second stream via GTM onto every web page. This seems plausible and has some background here(1) and here(2). Notice how advise for renaming the dataLayer conflicts though. Making matters worse, Google seems to have removed their suggestion from the documentation referenced in the second link.
  3. Create a master GA4 property with its own data stream and use a single GTM container to duplicate each event, sending to both GA4 properties. This is known to cause cookie issues, though someone here seems to have found some kind of workaround with custom HTML tags. It's worth noting that this is unsupported and I'm not sure how well its been tested.
  4. Export the data to BigQuery via GCPs automated process for this. Once it's all in there, UNION the data from each property to create one consolidated dataset for analysis. This has some draw backs: first, how are we to be sure that users who navigated between data streams on the website aren't regarded as separate users in the consolidated dataset? I worry that there are more issues abound that I'm not thinking of as well, potentially to do with source / medium attribution metrics. Secondly, but less importantly, this means a new solution is required for frontend-analysis as the data was taken out of GA4.

Currently, I am getting set up by utilizing GA4s integration with BigQuery for automatic streaming of web-event data into the BigQuery warehouse. This seems like the "least worst" solution in my book, as it is an officially supported method. However, I've yet to find a reliable assertion that we will not have oddities in the consolidated dataset due to the GA4 account structure. If we should be expecting issues, what might they be?

I'm hoping for a canonical answer to addressing my root issue. It's not an option to simply redesign the account to use one stream for the entire domain, as our vendor would then have no way to track the services they charge for. How should I obtain a consolidated dataset of all properties for analysis?


Solution

  • I agree that BigQuery is the best option. You don't need to worry about users being counted multiple times because the user_pseudo_id is set in the domain scope and remains identical between properties. The only consideration is when a user moves between directories and changes the source or medium. To ensure accuracy, you can manually test if a session_start event is triggered or if your new page_view has an ignore_referrer parameter with null value. If either of these occur, you should reconsider unwanted referrals and cross-domain tracking in your properties setting. Nothing else can fire your system.