Search code examples
pythonurllib2selenium-webdriver

Python Splinter (SeleniumHQ) how to take a screenshot of many webpages? [Connection refused]


I want to take a screenshot of many webpages, I wrote this:

from splinter.browser import Browser
import urllib2
from urllib2 import URLError

urls = ['http://ubuntu.com/', 'http://xubuntu.org/']


try :
    browser = Browser('firefox')
    for i in range(0, len(urls)) :
        browser.visit(urls[i])
        if browser.status_code.is_success() :
            browser.driver.save_screenshot('your_screenshot' + str(i) + '.png')
        browser.quit()
except SystemError :
    print('install firefox!')
except urllib2.URLError, e:
    print(e)
    print('theres no such website')
except Exception, e :
    print(e)
    browser.quit()

and I got this error:

<urlopen error [Errno 111] Connection refused>

How to fix it?:)

EDIT

When I have links in txt file, the code below doesnt work:

from splinter import Browser
import socket

urls = []
numbers = []

with open("urls.txt", 'r') as filename :
    for line in filename :
        line = line.strip()
        words = line.split("\t")
        numbers.append(str(words[0]))
        urls.append(str(words[1].rstrip()))

print(urls)

browser = None    
try:
    browser = Browser('firefox')
    for i, url in enumerate(urls, start=1):
        try:
            browser.visit(url)
            if browser.status_code.is_success():
                browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
        except socket.gaierror, e:
            print "URL not found: %s" % url
finally:
    if browser is not None:
        browser.quit()

My txt file looks like this:

1   http//ubuntu.com/
2   http//xubuntu.org/
3   http//kubuntu.org/

when I ran it, I got errors:

$ python test.py 
['http//ubuntu.com/', 'http//xubuntu.org/', 'http//kubuntu.org/']
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    browser.visit(url)
  File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/__init__.py", line 79, in visit
    self.driver.get(url)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 168, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 156, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 147, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: u'Component returned failure code: 0x804b000a (NS_ERROR_MALFORMED_URI) [nsIIOService.newURI]'

whats wrong this time?


Solution

  • Your problem is you do browser.quit() inside of your loop through the URLs, so it is no longer open for the second URL.

    Here's an updated version of your code:

    from splinter import Browser
    import socket
    
    urls = ['http://ubuntu.com/', 'http://xubuntu.org/']
    
    browser = None    
    try:
        browser = Browser('firefox')
        for i, url in enumerate(urls, start=1):
            try:
                browser.visit(url)
                if browser.status_code.is_success():
                    browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
            except socket.gaierror, e:
                print "URL not found: %s" % url
    finally:
        if browser is not None:
            browser.quit()
    

    The major change is moving the browser.quit() code into your main exception handler's finally, so that it'll happen no matter what goes wrong. Note also the use of enumerate to provide both the iterator value and its index; this is the recommend approach in Python over maintaining your own index pointer.

    I'm not sure if it's relevant for your code, but I found splinter raised socket.gaierror exceptions over urllib2.URLError, so I showed how you could trap them as well. I moved this exception handler inside of the loop; this will continue to grab the remaining screenshots even if one or more of the URLs are non-existent.