Search code examples
google-analyticsgoogle-analytics-api

Google Analytics API download of multiple dimensions - numbers decrease with more dimensions


I am in a quandary on this GA issue. I have a number of custom dimensions, including a user identifier, browser timestamp and section name. I'm downloading the results from GA using the v4 API with Python.

When I download the user ID (along with ga:eventCategory, ga:eventAction and ga:eventLabel), I get around 12K rows for a single day, which I believe is correct. When I add the timestamp, the numbers increase, as expected, to about 15K rows.

But when I add the final custom dimension, section name, the numbers decrease. Supposedly that dimension is always passed and defined.

This is counterintuitive to me. Why would the number of rows decrease when another dimension is added to the batch query?


Solution

  • The problem here is that final custom dimension was not always defined. So when I added the other dimension to the batch query, the rows were lost where the dimension was not defined.

    This should be spelled out in big letters in the Google Analytics documentation: if you don't define a dimension, you will lose that row when querying for that dimension. There should never be an empty value, instead use something like UNDEFINED. Then you can search in your downloads for that keyword.

    To find this problem, I downloaded all the results with the exception of that final dimension to a file all_but_section.csv. Then I download all the results with the final dimension to a file all_with_section.csv. I removed the section column from the CSV. With bit of UNIX trickery, this gives you the rows that are missing the dimension:

    cat all_but_section.csv all_with_section.csv | sort | uniq -u
    

    If somebody is interested, I can also provide a little python script I built to extract a column.