Search code examples
pythonhttpliburlparse

Python script to see if a web page exists without downloading the whole page?


I'm trying to write a script to test for the existence of a web page, would be nice if it would check without downloading the whole page.

This is my jumping off point, I've seen multiple examples use httplib in the same way, however, every site I check simply returns false.

import httplib
from httplib import HTTP
from urlparse import urlparse

def checkUrl(url):
    p = urlparse(url)
    h = HTTP(p[1])
    h.putrequest('HEAD', p[2])
    h.endheaders()
    return h.getreply()[0] == httplib.OK

if __name__=="__main__":
    print checkUrl("http://www.stackoverflow.com") # True
    print checkUrl("http://stackoverflow.com/notarealpage.html") # False

Any ideas?

Edit

Someone suggested this, but their post was deleted.. does urllib2 avoid downloading the whole page?

import urllib2

try:
    urllib2.urlopen(some_url)
    return True
except urllib2.URLError:
    return False

Solution

  • how about this:

    import httplib
    from urlparse import urlparse
    
    def checkUrl(url):
        p = urlparse(url)
        conn = httplib.HTTPConnection(p.netloc)
        conn.request('HEAD', p.path)
        resp = conn.getresponse()
        return resp.status < 400
    
    if __name__ == '__main__':
        print checkUrl('http://www.stackoverflow.com') # True
        print checkUrl('http://stackoverflow.com/notarealpage.html') # False
    

    this will send an HTTP HEAD request and return True if the response status code is < 400.

    • notice that StackOverflow's root path returns a redirect (301), not a 200 OK.