Search code examples
pythongoogle-analyticsgoogle-analytics-api

How to correctly add a filter to a request in Google Analytics using the library gaapi4py?


I need help with filters in a GA query. I am using the -> gaapi4py library (https://pypi.org/project/gaapi4py/#description). I have the following request:

from gaapi4py import GAClient

KEY_FILE_LOCATION = r'C:/.../key.json'
c = GAClient(json_keyfile=KEY_FILE_LOCATION)

....
request = c.get_all_data({
   'view_id': 'id',  
   'start_date': '2022-07-05',
   'end_date': '2022-07-05',
   'dimensions': {'ga:date'}, 
   'metrics': {'ga:sessions'},
   'filter': 'ga:pagePath== , #????
})

How can I add 'operator': 'PARTIAL' to the filter and add part of the link '/name/id/' correctly. And second question, how can I add operator: 'IN_LIST' when I use filter ga:eventCategory. Thanks


Solution

  • Unfortunately, currently the library doesn't support "dimensionFilterClauses" field where advanced filter clauses with operators like 'PARTIAL' etc. can be used. However, since Reporting API V4 filters is backwards compatible to Core Reporting API, similar filters like in your question can be replicated with "older" version of filter syntax:

    • "Partial" operator can be replaced with "=@", "contains substring" from core reporting API.
    • "In List" operator can be replaced with "=~" regex operator in the following way: if you want to query events with eventCategory one of ('my-event', 'some-other-event') you would write following filter: ga:eventCategory=~^(my-event|some-other-event)$, listing your events inside the parentheses separated by pipe symbols.
    • to combine two conditions, you can use separator , for "OR" condition and ; for "AND".

    More information can be found in this page: https://developers.google.com/analytics/devguides/reporting/core/v3/reference#filters

    So, if you want to filter both by page and by events, you would need to add the following filter string in the request body:

    from gaapi4py import GAClient
    
    KEY_FILE_LOCATION = r'C:/.../key.json'
    c = GAClient(json_keyfile=KEY_FILE_LOCATION)
    
    ....
    request = c.get_all_data({
       'view_id': 'id',  
       'start_date': '2022-07-05',
       'end_date': '2022-07-05',
       'dimensions': {'ga:date'}, 
       'metrics': {'ga:sessions'},
       'filter': 'ga:pagePath=@/name/id/;ga:eventCategory=~^(event1|event2|event3)$'
    })
    

    (On a side note, if you want to filter by events, you'd better be using hit-scope metrics in the "metrics" field instead of sessions; ga:hits or ga:totalEvents or ga:uniqueEvents will give you more expected results)