Search code examples
pythonrequestpython-requestsgooddata

Issues retrieving information from API


Unfortunately I cannot offer a reproducible dataset. I'm attempting to connect to an API and pull out report data from GoodData. I've been able to successfully connect and pull the report out, but occasionally it fails. There is a specific point in the script that it fails and I can't figure out why it works sometimes and not others.

connect to gd api, get temporary token

I created the below function to download the report. The function parameters are the project id within gooddata, the temporary token I received from logging in/authenticating, the file name I want it to be called, and the uri that I receive from calling the specific project and report id. the uri is like the location of the data.

uri looks something like (not real uri)..

'{"uri":"/gdc/projects/omaes11n7jpaisfd87asdfhbakjsdf87adfbkajdf/execute/raw/876dfa8f87ds6f8fd6a8ds7f6a8da8sd7f68as7d6f87af?q=as8d7f6a8sd7fas8d7fa8sd7f6a8sdf7"}'

from urllib2 import Request, urlopen
import re
import json
import pandas as pd
import os
import time

# function
def download_report(proj_id, temp_token, file_name, uri, write_to_file=True):
    headers = {
          'Accept': 'application/json',
          'Content-Type': 'application/json',
          'X-GDC-AuthTT': temp_token
        }

    uri2 = re.sub('{"uri":|}|"', '', uri)

    put_request = Request('https://secure.gooddata.com' + uri2, headers=headers)

    response = urlopen(put_request).read()

    with open(file_name + ".csv", "wb") as text_file:
        text_file.write(response)

    with open(file_name + ".csv", 'rb') as f:
        gd_data = pd.read_csv(f)

    if write_to_file:
        gd_data.to_csv(file_name + '.csv', index=False)
    return gd_data

The uri gets attached to the normal gooddata URL, along with the headers to extract the information into a text format which then gets converted into a csv/dataframe.

For some reason the dataframe is coming back just basically turning the uri into a dataframe instead of pulling the data out of the link. One last thing that I'm finding that is strange is that when I launch Spyder and try this, it fails the first time, always. If I try running it again, it will work. I don't know why. Since I'm trying to run this on a schedule its successfully running for a few days a couple times a day and then just starts failing.


Solution

  • Reason why you sometimes get URI to data result and not actual data result is that the data result is not yet ready. It sometimes takes a while to compute report. Besides the URI you also get HTTP status 202. It means that request was accepted, but result is not done yet.

    Check HTTP status with getcode() method. If you get 202, request the URI again until you get 200 and then read data result.