I have trouble merging data from the Google Analytics Reporting API into a Pandas Dataframe. Although the request and data collection are fine, I have trouble putting it in a Pandas DataFrame whenever I get multiple dimensions and metrics.
The output is a extremely long list of all our products. It starts with the dimension (productname and SKU) and then later in the list it passes the metrics (revenue and quantity) Example:
['PRODUCT1', '1234', 'PRODUCT2', '5678'..... 13.0, 324.0, 3.0, 322.0]
Errors I've ran into when converting to a DF:
ValueError: Length of values does not match length of index
and
"None of [Index(['Product', 'SKU', 'Revenue', 'Quantity'], dtype='object')] are in the [columns]"
Any ideas on how to put this into a proper dataFrame? I've used this article as a start but it only explains on how to export 1 dimension and 1 metric: https://www.jcchouinard.com/google-analytics-api-using-python/
My code:
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = 'client_secrets.json'
VIEW_ID = '123456789' #Random number
credentials = ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION, SCOPES)
analytics = build('analyticsreporting', 'v4', credentials=credentials)
response = analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID, #Add View ID from GA
'dateRanges': [
{'startDate': '30daysAgo', 'endDate': 'today'},
],
'metrics': [
{'expression': 'ga:itemRevenue'},
{'expression': 'ga:itemQuantity'}
],
'dimensions': [
{
"name": "ga:productName"
},{
"name": "ga:productSku"
}
],
#"filtersExpression":"ga:pagePath=~products;ga:pagePath!@/translate", #Filter by condition "containing products"
'orderBys': [{"fieldName": "ga:itemRevenue", "sortOrder": "DESCENDING"}],
'pageSize': 1000
}]
}
).execute()
dim = []
val = []
#Extract Data
for report in response.get('reports', []):
columnHeader = report.get('columnHeader', {})
dimensionHeaders = columnHeader.get('dimensions', [])
metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
rows = report.get('data', {}).get('rows', [])
for row in rows:
dimensions = row.get('dimensions', [])
dateRangeValues = row.get('metrics', [])
for header, dimension in zip(dimensionHeaders, dimensions):
dim.append(dimension)
for i, values in enumerate(dateRangeValues):
for metricHeader, value in zip(metricHeaders, values.get('values')):
val.append(float(value))
df = pd.DataFrame()
df["Revenue", "Quantity"]=val
df["Product", "SKU" ]=dim
df=df[["Product", "SKU","Revenue", "Quantity"]]
print(df)
I've solved this by using the gapandas
package.
This Google Analytics API doesn't work that great for these operations. Use this one instead: