Search code examples
google-analyticsgoogle-analytics-api

google analytics api -- fetch the user bucket score of every client id


I am digging into the user bucket feature in google analytics in order to know what client ids are in the treatment and control group in my campaign experiments in google ad. client id is a custom dimension with index 27 in my google analytics setting. I refer the develop guide here https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/

I am trying to fetch the (date,client_id,user_bucket,user) values using google analytics but it seems that the api only gives 50% of the total data.

Here is the request code to check the (date, user) and it is aligned with the number on GA UI, which is pretty good.

return (
    analytics.reports()
    .batchGet(
        body={
            "reportRequests": [
                {
                    "viewId": VIEW_ID,
                    "pageSize": "100000",
                    "pageToken": pageToken,
                    "dateRanges": [
                        {"startDate": dateRange[0], "endDate": dateRange[1]}
                    ],
                    "metrics": [
                       {"expression": "ga:users"},
                        
                    ],
                    "dimensions": [
                        {"name": "ga:date"},
                    ],
                }
            ]
        }
    )
    .execute()
)

Output

enter image description here

However, when I add the cliend_id and user_bucket, the number is cut off by 50%.

    return (
    analytics.reports()
    .batchGet(
        body={
            "reportRequests": [
                {
                    "viewId": VIEW_ID,
                    "pageSize": "100000",
                    "pageToken": pageToken,
                    "dateRanges": [
                        {"startDate": dateRange[0], "endDate": dateRange[1]}
                    ],
                    "metrics": [
                       {"expression": "ga:users"},
                        
                    ],
                    "dimensions": [
                        {"name": "ga:date"},
                        {"name": "ga:dimension27"},
                        {"name": "ga:userBucket"},
                    ],
                }
            ]
        }
    )
    .execute()
)

The result output is

enter image description here

And aggregated the client_id to date level, which is not aligned with the previous user number. Plus, I cannot figure out why the ga_user has the constant value 2 (I think it should be 1). Thanks!

enter image description here


Solution

  • Try to check if your response contains sampled data. The reason could be that.

    You are querying a non-standard report and judging by the number of users you are likely to be exceeding the sampling thresholds of sessions (500.000).

    If you are adding up the users on individual days then this is a conceptual mistake. This is because if a user visited the site on day 1, then the same user visited it on day 2 and then on day 3, in the report where you split by date you will have 1 user every day and if you add them it will be 3 but in reality the user is the same so in 3 days you have 1 user. So you can't sum users that way.