Search code examples
pythonrequesturllibuser-agent

Solving HTTP Error 400: Bad Request with working links in Google Chrome


I know this has been asked in many forms already, but I can't seem to find my answer and hope to receive some help here. I try to download files that are stored behind a list of urls..

I've found following function that should do what I want:

import os.path
import urllib.request
import requests

for link in links:
    link = link.strip()
    name = link.rsplit('/', 1)[-1]
    filename = os.path.join('downloads', name)

    if not os.path.isfile(filename):
        print('Downloading: ' + filename)
        try:
            urllib.request.urlretrieve(link, filename)
        except Exception as inst:
            print(inst)
            print('  Encountered unknown error. Continuing.')

I always receive: HTTP Error 400: Bad Request.

I tried to set user-agents to fake a browser visit (I use Google Chrome), but it did not help at all. The links work if copied in the browser, hence I wonder how to solve this.


Solution

  • I found the answer to my own question.

    The problem was that the urls contained white spaces, which apparently can not be read in properly by urllib.request. The solution is to first parse the urls into quotes and then call the quoted url.

    This is the working code for all that run into the same problem:

    import os.path
    import urllib.request
    import requests
    import urllib.parse
    
    for link in urls:
        link = link.strip()
        name = link.rsplit('/', 1)[-1]
        filename = os.path.join(name)
        quoted_url = urllib.parse.quote(link, safe=":/")
    
        if not os.path.isfile(filename):
            print('Downloading: ' + filename)
            try:
                urllib.request.urlretrieve(quoted_url, filename)
            except Exception as inst:
                print(inst)
                print('  Encountered unknown error. Continuing.')