Search code examples
pythonurllib2download

Python ValueError: unknown url type: space (?)


I am using the urllib2 module in Python 2.7 using Spyder 3.0 to batch download text files by reading a text file that contains a list of them:

    reload(sys)
    sys.setdefaultencoding('utf-8')
    with open('ocean_not_templated_url.txt', 'r') as text:
        lines = text.readlines()
        for line in lines:
            url = urllib2.urlopen(line.strip('ï \xa0\t\n\r\v'))
            with open(line.strip('\n\r\t ').replace('/', '!').replace(':', '~'), 'wb') as out:
                for d in url:
                    out.write(d)

I've already discovered a bunch of weird characters in the urls that I've since stripped, however, the script fails when nearly 90% complete, giving the following error:

enter image description here

I thought it to be a non-breaking space (denoted by \xa0 in the code), but it still fails. Any ideas?


Solution

  • That's an odd URL!

    Specify the communication protocol over the network. Try prefixing the URL with http:// and the domain names if the file exists on the WWW.

    Files always reside somewhere, in some server's directory, or locally on your system. So there must be a network path to such files, for example:

    http://127.0.0.1/folder1/samuel/file1.txt

    Same example, with localhost being an alias for 127.0.0.1 (generally)

    http://localhost/folder1/samuel/file1.txt

    That might solve the problem. Just think about where your file exists and how it should be addressed...


    Update:

    I experimented quite a bit on this. I think I know why that error is raised! :D

    I speculate that your file which stores the URL's actually has a sneaky empty line near the end. I can say it's near the end as you said that it executes about 90% of it and then fails. So, the python urllib2 function get_type is unable to process that empty url and throws unknown url type:

    I think that's the problem! Remove that empty line in the file ocean_not_templated_url.txt and try it out!

    Just check and let me know! :P