Search code examples
pythongoogle-analytics-apigoogle-analytics-firebase

Update the input of a loop with a result from the previous iteration of the loop


(I've added the google-analytics api tags but I suspect that my issue is more a fundamental flaw in my approach to a loop, detailed below)

I'm using Python to query the Google Analytics API (V4). Having already successfully connected to the API with my credentials, I'm trying to loop over each 10k result set returned by the API to get the full results set.

When querying the API you pass a dict that looks something like this:

{'reportRequests':[{'viewId': '1234567', # my actual view id goes here of course
    'pageToken': 'go', # can be any string initially (I think?)
    'pageSize': 10000,
    'samplingLevel': 'LARGE',
    'dateRanges': [{'startDate': '2018-06-01', 'endDate': '2018-07-13'}],
    'dimensions': [{'name': 'ga:date'}, {'name': 'ga:dimension1'}, {'name': 'ga:dimension2'}, {'name': 'ga:userType'}, {'name': 'ga:landingpagePath'}, {'name': 'ga:deviceCategory'}],
    'metrics': [{'expression': 'ga:sessions'}, {'expression': 'ga:bounces'}, {'expression': 'ga:goal1Completions'}]}]}

According to the documentation on Google Analytics API V4 on the pageToken parameter:

"A continuation token to get the next page of the results. Adding this to the request will return the rows after the pageToken. The pageToken should be the value returned in the nextPageToken parameter in the response to the reports.batchGet request. "

My understanding is that I need to query the API in chunks of 10,000 (max query result size allowed) and that to do this I must pass the value of nextPageToken field returned in each query result into the new query.

In researching, it sounds like the nextPageToken field will be a empty string when all the results have been returned.

So, I tried a while loop. To get to the loop stage I built some functions:

## generates the dimensions in the right format to use in the query
def generate_dims(dims):
    dims_ar = []
    for i in dims:
        d = {'name': i}
        dims_ar.append(d)
    return(dims_ar)

## generates the metrics in the right format to use in the query
def generate_metrics(mets):
    mets_ar = []
    for i in mets:
        m = {'expression': i}
        mets_ar.append(m)
    return(mets_ar)

## generate the query dict
def query(pToken, dimensions, metrics, start, end):
    api_query = {
            'reportRequests': [
                    {'viewId': VIEW_ID,
                     'pageToken': pToken,          
                     'pageSize': 10000,
                     'samplingLevel': 'LARGE',
                     'dateRanges': [{'startDate': start, 'endDate': end}],
                     'dimensions': generate_dims(dimensions),
                     'metrics': generate_metrics(metrics)
                     }]
    }
    return(api_query)

Example output of the above 3 functions:

sessions1_qr = query(pToken = pageToken,
                     dimensions = ['ga:date', 'ga:dimension1', 'ga:dimension2',
                                   'ga:userType', 'ga:landingpagePath',
                                   'ga:deviceCategory'],
                     metrics = ['ga:sessions', 'ga:bounces', 'ga:goal1Completions'],
                     start = '2018-06-01',
                     end = '2018-07-13')

The results of this look like the first code block in this post.

So far so good. Here's the loop I attempted:

def main(query):
    global pageToken, store_response

    # debugging, was hoping to see print output on each iteration (I didn't)
    print(pageToken)

    while pageToken != "":
        analytics = initialize_analyticsreporting()
        response = get_report(analytics, query)
        pageToken = response['reports'][0]['nextPageToken'] # < IT ALL COMES DOWN TO THIS LINE HERE
        store_response['pageToken'] = response

    return(False) # don't actually need the function to return anything, just append to global store_response.

Then I tried to run it:

pageToken = "go" # can be any string to get started
store_response = {}
sessions1 = main(sessions1_qr)

The following happens:

  • The console remains busy
  • The line print(pageToken) print's once to the console, the initial value of pageToken
  • store_response dict has one item in it, not many as was hoped for

So, it looks like my loop runs once only.

Having stared at the code I suspect it has something to do with the value of query parameter that I pass to main(). When I initially call main() the value of query is the same as the first code block above (variable sessions1_qr, the dict with all the API call parameters). On each loop iteration this is supposed to update so that the value of pageToken is replaced with the responses nextPageToken value.

Put another way and in short, I need to update the input of the loop with a result from the previous iteration of the loop. My logic is clearly flawed so any help very much appreciated.

Adding some screen shots per comments discussion: enter image description here enter image description here enter image description here


Solution

  • This is the approach I would take to solve this:

    def main(query):
        global pageToken, store_response
    
        while pageToken != "":
            # debugging, was hoping to see print output on each iteration (I didn't)
            print(pageToken)
            analytics = initialize_analyticsreporting()
            response = get_report(analytics, query)
    
            # note that this has changed -- you were using 'pageToken' as a key
            # which would overwrite each response
            store_response[pageToken] = response
    
            pageToken = response['reports'][0]['nextPageToken'] # update the pageToken
            query['reportRequests'][0]['pageToken'] = pageToken # update the query
    
    
        return(False) # don't actually need the function to return anything, just append to global store_response.
    

    i.e. update the query data structure manually, and store each of the responses with the pageToken as the dictionary key.

    Presumably the last page has '' as the nextPageToken so your loop will stop.