class Crawl(webapp2.RequestHandler):
def get(self):
from google.appengine.api import urlfetch
url = "http://www.example.com/path/to a/page" #URL with a space
result = urlfetch.fetch(url)
self.response.write('url: %s' % (result.status_code)) ## Outputs 400
self.response.write(content) # Gives me 400 error page
We can't deny the fact that there are thousands of URLs that contain spaces. There is no way we can correct them one by one.
Why does urlfetch get 400 bad request error for this kind of URL which is perfectly accessible through the browser? How to overcome this?
This is caused because the URL needs to be properly encode (as discussed below). Make sure any url's with spaces are properly encoded with a %20
in place of any space.