Search code examples
pythonapigoogle-analyticsgoogle-analytics-api

Regexp filtering by custom dimensions in Google Analytics API in Python


I am querying the Google Analytic API in Python 2.7. Below are two examples of code. The first one works, but does not involve the filtering I want to do. The second chunk of code does not work, but represents my attempt to filter my results based on the content of a custom GA dimension, which in this case is whether pages contain a custom GA tag.

This is how I query the GA API, which works, but does not involve the filtering I need:

import datetime
import time
from datetime import timedelta

# get start and end dates in correct format
start_date = d1.strftime('%Y-%m-%d')
end_date   = d2.strftime('%Y-%m-%d')

# Create ReportRequest object
response = service.reports().batchGet(
    body={
        'reportRequests': [
            {
                'viewId': 'ga:32981293',

                'dateRanges': [{'startDate': start_date, 'endDate': end_date}],

                'metrics': [{'expression': 'ga:sessions'},
                            {'expression': 'ga:pageviews'},
                            {'expression': 'ga:users'},
                            {'expression': 'ga:exits'},
                            {'expression': 'ga:avgSessionDuration'},
                            {'expression': 'ga:avgTimeonPage'},
                            {'expression': 'ga:sessionsPerUser'},
                            {'expression': 'ga:percentNewSessions'},
                            {'expression': 'ga:bounceRate'}],

                'dimensions': [{"name": "ga:pagePath"}],

                'orderBys': [{"fieldName": "ga:pageviews", "sortOrder": "DESCENDING"}],

                'pageSize': 500

            }]
    }
).execute()

So, the above could gives me an object where my Page Paths are sorted based on pageviews, and I'm getting the associated metrics that I asked for.

What I want to do is add a filter so that I only get this information for Page Paths that contain a certain tag. In this case, I have a custom GA dimension called "News Page Tags" and I want to get information only for Page Paths that have specific News Page Tags.

Here's one attempt at this, which does not work. I'm hoping there is some kind of syntax issue that someone can help me clear up.

import datetime
import time
from datetime import timedelta

# get start and end dates in correct format
start_date = d1.strftime('%Y-%m-%d')
end_date   = d2.strftime('%Y-%m-%d')

cur_tag = 'example string'

# Create ReportRequest object
response = service.reports().batchGet(
    body={
        'reportRequests': [
            {
                'viewId': 'ga:32981293',

                'dateRanges': [{'startDate': start_date, 'endDate': end_date}],

                'metrics': [{'expression': 'ga:sessions'},
                            {'expression': 'ga:pageviews'},
                            {'expression': 'ga:users'},
                            {'expression': 'ga:exits'},
                            {'expression': 'ga:avgSessionDuration'},
                            {'expression': 'ga:avgTimeonPage'},
                            {'expression': 'ga:sessionsPerUser'},
                            {'expression': 'ga:percentNewSessions'},
                            {'expression': 'ga:bounceRate'}],

                'dimensions': [{"name": "ga:pagePath"},
                               {"name": "ga:newsPageTags"}],

                'orderBys': [{"fieldName": "ga:pageviews", "sortOrder": "DESCENDING"}],

                'dimensionFilterClauses': [
                            {"filters": [{"dimensionName": "ga:newsPageTags",
                                          "operator": "REGEXP",
                                          "expressions": [cur_tag]}]
                            }
                                        ],  

                'pageSize': 500

            }]
    }
).execute()

So, in the above, I added "ga:newsPageTags" as a second dimension, with the intention of filtering pages so that I only get metrics for pages that have "cur_tag" as one of their News Page Tags.

Running that second chunk of code with the filter yields the following error:

Traceback (most recent call last): File "", line 31, in File "build/bdist.macosx-10.7-x86_64/egg/oauth2client/_helpers.py", line 133, in positional_wrapper File "build/bdist.macosx-10.7-x86_64/egg/googleapiclient/http.py", line 842, in execute googleapiclient.errors.HttpError: https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json returned "Unknown dimension(s): ga:newsPageTags For details see https://developers.google.com/analytics/devguides/reporting/core/dimsmets.">

Obviously I am not specifying my custom dimension correctly, and I am hoping there is a way to query the GA API to filter based on custom tags, although I haven't had any luck finding good documentation to help me solve this problem.

Thanks!


Solution

  • Custom dimensions in the API are specified by number rather than the name. So for example you would filter by ga:dimension15 rather than ga:newsPageTags

    https://developers.google.com/analytics/devguides/reporting/core/dimsmets#view=detail&group=custom_variables_or_columns