Search code examples
google-analyticsgoogle-analytics-api

Why is this Google Analytics query unsampled via the web UI but sampled via the GA API?


I recently attempted to query the Google Analytics API for a report using device category, source, and medium as dimensions. The report covered about four weeks of time. Despite the fact that I was able to build the equivalent ad-hoc report in the UI and get results based on 100% of sessions, I couldn't get the API to give me results based on any more than 1.3% or so of sessions. The client I'm using is based on the v3 API, but I got the same results when using Google's v4 testing tool, so it's not a function of the API version.

According to Google's documentation, ad-hoc reports are supposed to use pre-aggregated unsampled data where possible:

Ad-hoc reports are based on any non-standard query of Analytics data. For example, if you apply a segment or secondary dimension to a standard report, then Analytics has to issue a new, non-standard query of the data to return that information.

The new query goes first to the tables of aggregated data to see if all of the requested information is available there. If the information is not available there, then Analytics queries the complete, unfiltered set of data and computes new aggregates to satisfy the application of the segment or secondary dimension.

This is apparently true of the web UI, but not necessarily of the API. I was under the impression that the web UI was making calls equivalent to those exposed in the API under the hood, but it seems that this isn't the case. Does anybody know whether it's possible to force the API query to use the pre-aggregated data sets that I know are available?


Solution

  • The difference in sampling threshold between the web UI and the API does indeed explain this. This happens to be a 360 account, for which the sampling threshold is much higher than the API permits (the documentation is cagey about exact numbers but apparently it can be "up to 100M sessions"). The same test on a standard account showed equivalent behavior between the API and the web UI. Google's issue tracker for the GA API indicates that they do not plan to increase the sampling threshold beyond 1M sessions even for 360 accounts.