Search code examples
pythonpython-3.xpandasdataframegoogle-analytics-api

Google Analytics Data to Pandas Dataframe


I'm trying to send google analytics data to a pandas dataframe using the google analytics api. I've followed along the code examples that are available in the official documentation and I now have code that manages to print out the data that I need. I need help figuring out how to send the data to a pandas dataframe instead of just printing it out.

Once I execute the query, this is the raw output that i get:

{'kind': 'analytics#gaData', 'id': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'query': {'start-date': '7daysAgo', 'end-date': 'today', 'ids': 'ga:XXXXXXX', 'dimensions': 'ga:date', 'metrics': ['ga:sessions', 'ga:transactions'], 'start-index': 1, 'max-results': 1000}, 'itemsPerPage': 1000, 'totalResults': 8, 'selfLink': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'profileInfo': {'profileId': 'XXXXXXX', 'accountId': 'XXXXXXX', 'webPropertyId': 'XXXXXXX', 'internalWebPropertyId': 'XXXXXXX', 'profileName': 'XXXXXXX', 'tableId': 'ga:XXXXXXX'}, 'containsSampledData': False, 'columnHeaders': [{'name': 'ga:date', 'columnType': 'DIMENSION', 'dataType': 'STRING'}, {'name': 'ga:sessions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}, {'name': 'ga:transactions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}], 'totalsForAllResults': {'ga:sessions': '86913', 'ga:transactions': '312'}, 'rows': [['20200114', '11965', '41'], ['20200115', '11052', '51'], ['20200116', '11396', '38'], ['20200117', '11097', '28'], ['20200118', '10490', '46'], ['20200119', '9829', '34'], ['20200120', '12280', '36'], ['20200121', '8804', '38']]}

The google documentation uses this function to output this data in a print statement:

def print_results(results):

    # Print header.
    output = []
    for header in results.get('columnHeaders'):
        output.append('%30s' % header.get('name'))
    print(''.join(output))

    # Print data table.
    if results.get('rows', []):
        for row in results.get('rows'):
            output = []
            for cell in row:
                output.append('%30s' % cell)
            print(''.join(output))
    else:
        print('No Rows Found')

As you can see, we need to capture results[columnHeaders][name] as the column headers and we need to capture results[rows] as the data that needs to fed into a pandas dataframe.

How can I create a function to put this data in a dataframe?


Solution

  • Try the below code:

    import pandas as pd
    results = {'kind': 'analytics#gaData', 'id': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'query': {'start-date': '7daysAgo', 'end-date': 'today', 'ids': 'ga:XXXXXXX', 'dimensions': 'ga:date', 'metrics': ['ga:sessions', 'ga:transactions'], 'start-index': 1, 'max-results': 1000}, 'itemsPerPage': 1000, 'totalResults': 8, 'selfLink': 'https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXXXX&dimensions=ga:date&metrics=ga:sessions,ga:transactions&start-date=7daysAgo&end-date=today', 'profileInfo': {'profileId': 'XXXXXXX', 'accountId': 'XXXXXXX', 'webPropertyId': 'XXXXXXX', 'internalWebPropertyId': 'XXXXXXX', 'profileName': 'XXXXXXX', 'tableId': 'ga:XXXXXXX'}, 'containsSampledData': False, 'columnHeaders': [{'name': 'ga:date', 'columnType': 'DIMENSION', 'dataType': 'STRING'}, {'name': 'ga:sessions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}, {'name': 'ga:transactions', 'columnType': 'METRIC', 'dataType': 'INTEGER'}], 'totalsForAllResults': {'ga:sessions': '86913', 'ga:transactions': '312'}, 'rows': [['20200114', '11965', '41'], ['20200115', '11052', '51'], ['20200116', '11396', '38'], ['20200117', '11097', '28'], ['20200118', '10490', '46'], ['20200119', '9829', '34'], ['20200120', '12280', '36'], ['20200121', '8804', '38']]}
    
    def print_results(results):
        column_names = []
        for header in results.get('columnHeaders'):
            column_names.append(header.get('name'))
        data = results.get('rows')
        create_dataframe(data, column_names)
    
    def create_dataframe(data, column_names):
        df = pd.DataFrame(data, columns = column_names)
        #prints the dataframe
        print(df)
    
    print_results(results)
    
    #output
        ga:date ga:sessions ga:transactions
    0  20200114       11965              41
    1  20200115       11052              51
    2  20200116       11396              38
    3  20200117       11097              28
    4  20200118       10490              46
    5  20200119        9829              34
    6  20200120       12280              36
    7  20200121        8804              38