Search code examples
pythonpython-2.7request

No schema supplied and other errors with using requests.get()


I'm learning Python by following Automate the Boring Stuff. This program is supposed to go to http://xkcd.com/ and download all the images for offline viewing.

I'm on version 2.7 and Mac.

For some reason, I'm getting errors like "No schema supplied" and errors with using request.get() itself.

Here is my code:

# Saves the XKCD comic page for offline read

import requests, os, bs4, shutil

url = 'http://xkcd.com/'

if os.path.isdir('xkcd') == True: # If xkcd folder already exists
    shutil.rmtree('xkcd') # delete it
else: # otherwise
    os.makedirs('xkcd') # Creates xkcd foulder.


while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop
    # Download the page
    print 'Downloading %s page...' % url
    res = requests.get(url) # Get the page
    res.raise_for_status() # Check for errors

    soup = bs4.BeautifulSoup(res.text) # Dowload the page
    # Find the URL of the comic image
    comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem
    if comicElem == []: # if the list is empty
        print 'Couldn\'t find the image!'
    else:
        comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to
        # comicUrl

        # Download the image
        print 'Downloading the %s image...' % (comicUrl)
        res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
        res.raise_for_status() # Check for errors

        # Save image to ./xkcd
        imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
        for chunk in res.iter_content(10000):
            imageFile.write(chunk)
        imageFile.close()
    # Get the Prev btn's URL
    prevLink = soup.select('a[rel="prev"]')[0]
    # The Previous button is first <a rel="prev" href="/1535/" accesskey="p">&lt; Prev</a>
    url = 'http://xkcd.com/' + prevLink.get('href')
    # adds /1535/ to http://xkcd.com/

print 'Done!'

Here are the errors:

Traceback (most recent call last):
  File "/Users/XKCD.py", line 30, in <module>
    res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
  File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request
    prep = self.prepare_request(req)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare
    self.prepare_url(url, params)
  File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url
    to_native_string(url, 'utf8')))
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?

The thing is I've been reading the section in the book about the program multiple times, reading the Requests doc, as well as looking at other questions on here. My syntax looks right.

Thanks for your help!

Edit:

This didn't work:

comicUrl = ("http:"+comicElem[0].get('src')) 

I thought adding the http: before would get rid of the no schema supplied error.


Solution

  • change your comicUrl to this

    comicUrl = comicElem[0].get('src').strip("http://")
    comicUrl="http://"+comicUrl
    if 'xkcd' not in comicUrl:
        comicUrl=comicUrl[:7]+'xkcd.com/'+comicUrl[7:]
    
    print "comic url",comicUrl