Search code examples
extractgoogle-analytics-4universal-analytics

Extracting hit-level data from Google Analytics


I am working on a project where I need to programmatically fetch the rawest possible data from Google Analytics, including historical data from both Universal Analytics (UA) and Google Analytics 4 (GA4) properties. Unfortunately, the options I've explored so far don't meet my needs, and I'm looking for advice or alternative solutions.

Solutions I found so far and constraints:

Google Analytics Data API & Google Analytics Reporting API: These options seem to restrict me to reports-type data, which doesn't suit my requirement for raw data analysis. I am looking for a way to bypass these limitations and fetch more granular data.

BigQuery Data Transfer API: This would have been an ideal solution, but it's not set up on the account. Additionally, it seems we can only set it up for GA4 accounts and not for the older UA properties. Plus, it only allows querying data collected after its setup, which doesn't help with my need for historical data access.

Google Analytics 360: Although GA 360 seems to provide the level of access i need, its cost ($150k/year) makes it an unfeasible option for me.

Given these constraints, I am seeking alternative methods or workarounds to access raw Google Analytics data. Ideally, I want to fetch all historical data up to the current date from both UA and GA4 properties without having to resort to BigQuery or GA 360 due to the reasons mentioned above.

Has anyone successfully implemented a solution for a similar scenario and can share their approach?


Solution

  • It's a common problem, and in short: no good news.

    The solution for GA4 is BQ. Nothing else will get you that level of the raw hit-level data you need, so connect it to the account and proceed.

    UA, on the other hand, is a dark horse. As you said, its API has this weird awkward limit of nine or so dimensions per request. It will give you the raw data, but you can't seem to pull the cid dimension to then glue these nine dimensions to the next nine dimensions. This has been known for a decade or two though so all GA UA non-360 ETL would always involve tracking the CID in a custom dimension on the hit-level and then the ETL would join the "report" downloads by that plus a timestamp, effectively gluing the fractured hits together in the DB.

    If you don't have the client id CD in your historical data, that's it. Unjoined reports is the best you'll get unless you wanna pay for 360 just to properly export it into the BQ. You can actually get 360 a lot cheaper if you try and buy it via a partner. Partners have some bargaining space. Maybe some even can pull a great month-long deal, but unlikely.

    The general idea here is that GA UA has been the best free analytics out there. Even GA4, with all its new drawbacks and limitations, still is the best free solution out there. As a free solution, it doesn't have to be perfect, it just has to be slightly better than the next best thing, which is really not a high bar to aim for.