Search code examples
pythonpandasdataframegoogle-analyticsgoogle-analytics-api

How do you convert multiple Dimensions and Metrics into a Pandas Dataframe using the Google Analytics Reporting API?


I have trouble merging data from the Google Analytics Reporting API into a Pandas Dataframe. Although the request and data collection are fine, I have trouble putting it in a Pandas DataFrame whenever I get multiple dimensions and metrics.

The output is a extremely long list of all our products. It starts with the dimension (productname and SKU) and then later in the list it passes the metrics (revenue and quantity) Example:

['PRODUCT1', '1234', 'PRODUCT2', '5678'..... 13.0, 324.0, 3.0, 322.0]

Errors I've ran into when converting to a DF:

ValueError: Length of values does not match length of index

and

"None of [Index(['Product', 'SKU', 'Revenue', 'Quantity'], dtype='object')] are in the [columns]"

Any ideas on how to put this into a proper dataFrame? I've used this article as a start but it only explains on how to export 1 dimension and 1 metric: https://www.jcchouinard.com/google-analytics-api-using-python/

My code:


from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = 'client_secrets.json'
VIEW_ID = '123456789' #Random number

credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)

analytics = build('analyticsreporting', 'v4', credentials=credentials)

response = analytics.reports().batchGet(
    body={
        'reportRequests': [
            {
                'viewId': VIEW_ID, #Add View ID from GA
                'dateRanges': [
                    {'startDate': '30daysAgo', 'endDate': 'today'},
                    ],
                'metrics': [
                    {'expression': 'ga:itemRevenue'},
                    {'expression': 'ga:itemQuantity'}
                ],
                'dimensions': [
                    {
                        "name": "ga:productName"
                    },{
                        "name": "ga:productSku"
                    }
                    ],
                #"filtersExpression":"ga:pagePath=~products;ga:pagePath!@/translate", #Filter by condition "containing products"
                'orderBys': [{"fieldName": "ga:itemRevenue", "sortOrder": "DESCENDING"}],
                'pageSize': 1000
            }]
    }
).execute()


dim = []
val = []

#Extract Data
for report in response.get('reports', []):

    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])

    for row in rows:

        dimensions = row.get('dimensions', [])
        dateRangeValues = row.get('metrics', [])

        for header, dimension in zip(dimensionHeaders, dimensions):
            dim.append(dimension)


        for i, values in enumerate(dateRangeValues):
            for metricHeader, value in zip(metricHeaders, values.get('values')):
                val.append(float(value))

df = pd.DataFrame()
df["Revenue", "Quantity"]=val
df["Product", "SKU" ]=dim
df=df[["Product", "SKU","Revenue", "Quantity"]]
print(df)



Solution

  • I've solved this by using the gapandas package.

    This Google Analytics API doesn't work that great for these operations. Use this one instead:

    https://github.com/flyandlure/gapandas