Search code examples
pythonpython-2.7urllib2

Python urllib2.Request title response unreadable for instagram


i am having this code below:

def check_proxy(input_queue):
    while 1:
        prx = input_queue.get()
        try:
            proxy_handler = urllib2.ProxyHandler({'http': prx})
            opener = urllib2.build_opener(proxy_handler)
            opener.addheaders = [('User-agent', 'Mozilla/5.0')]
            urllib2.install_opener(opener)
            req = urllib2.Request("http://www.google.com")
            sock = urllib2.urlopen(req, timeout=7)
            rs = sock.read(1000)
            if '<title>Google</title>' in rs:
                print '[OK]', prx
                input_queue.task_done()

I've changed checking Google.com to instagram.com

and I've changed the title to Instagram

But it doesn't work for some reason.

I've checked "print rs" for instagram. and got this title:

        <title>
Instagram
</title>

Just wonder how to make it work to check for https://www.instagram.com instead of google.com

Thank you


Solution

  • The string to match the title in Instagram's HTML should be '<title>\nInstagram\n<title>. Google's does not have newlines in their HTML, but Instagram does.