Search code examples
python-3.xurlrequest

How to handle ezproxy authorization to download data through API?


I have a token to have an access to download large files from the comtrade. The original webpage is http://comtrade.un.org/ however I have a premium access through my university library subscription. So,if I want to use the premium features the website automatically redirects me to this page and after pressing login button the URL is https://ezproxy.nu.edu.kz:5588/data/dev/portal/. I am trying to send request and download files with API(using requests). I am getting response from http://comtrade.un.org/ but in order to download I need to use https://ezproxy.nu.edu.kz:5588/data/dev/portal/. and when I tried to download:

urllib.error.HTTPError: HTTP Error 401: Unauthorized

This error message appeared. How can I handle this problem?

px = 'px=HS&' #classification
freq = 'freq=A&' #annual
type = 'type=C&' #commodity
auth = 'https://comtrade.un.org/api/getUserInfo?token=ZF5TSW8giRQMFHuPmS5JwQLZ5FB%2BNO0NCcjxFQUJADrLzCRDCkG5F0ZPnZTYQWO3MPgj96gZNF7Z9iN8BwscUMYBbXuDVYVDvsTAVNzAJ6FNC2dnN7gtB1rt9qJShAO467zBegHTLwvmlRIBSpjjwg%3D%3D'




with open('reporterAreas.json') as json_file:
    data = json.load(json_file)

ls = data['results']

list_year = [*range(2011, 2021,1)]
for years in list_year:
    print(years)
    ps = 'ps='+ str(years) + '&'
    for item in ls:
        r = item['id']                              #report_country_id
        report_country_txt = item['text']
        if r == 'all':
            req_url = 'r=' + r + '&' + px + ps + type + freq + token
            request = url + req_url
            response = requests.get(request)
            if response.status_code == 200:
                print("Response is OK!")
            data = response.json()[0]
            download_url = dwld_url + data['downloadUri']
            print(download_url)
            filename = str(years) + '_' + report_country_txt + '.zip'
            urllib.request.urlretrieve(url, filename)

Solution

  • I'm not sure if Ezproxy provides an API or SDK way to authenticate a request but i don't think.

    What you could do is to provide the Ezproxy session to your request and with that, you request will not be treated as unauthorized because you're passing a valid session and therefore your request will be treated as a valid one.

    Notice that you can retrieve your Ezproxy session id from your cookies.

    enter image description here

    And finally, you have to make your request against the starting point url

    enter image description here


    Otherwise, you can use selenium to fill automatically the login form and retrieve the Ezproxy session id to pass it to the requests.

    I hope this could help you !