I want to take a screenshot of many webpages, I wrote this:
from splinter.browser import Browser
import urllib2
from urllib2 import URLError
urls = ['http://ubuntu.com/', 'http://xubuntu.org/']
try :
browser = Browser('firefox')
for i in range(0, len(urls)) :
browser.visit(urls[i])
if browser.status_code.is_success() :
browser.driver.save_screenshot('your_screenshot' + str(i) + '.png')
browser.quit()
except SystemError :
print('install firefox!')
except urllib2.URLError, e:
print(e)
print('theres no such website')
except Exception, e :
print(e)
browser.quit()
and I got this error:
<urlopen error [Errno 111] Connection refused>
How to fix it?:)
EDIT
When I have links in txt file, the code below doesnt work:
from splinter import Browser
import socket
urls = []
numbers = []
with open("urls.txt", 'r') as filename :
for line in filename :
line = line.strip()
words = line.split("\t")
numbers.append(str(words[0]))
urls.append(str(words[1].rstrip()))
print(urls)
browser = None
try:
browser = Browser('firefox')
for i, url in enumerate(urls, start=1):
try:
browser.visit(url)
if browser.status_code.is_success():
browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
except socket.gaierror, e:
print "URL not found: %s" % url
finally:
if browser is not None:
browser.quit()
My txt file looks like this:
1 http//ubuntu.com/
2 http//xubuntu.org/
3 http//kubuntu.org/
when I ran it, I got errors:
$ python test.py
['http//ubuntu.com/', 'http//xubuntu.org/', 'http//kubuntu.org/']
Traceback (most recent call last):
File "test.py", line 21, in <module>
browser.visit(url)
File "/usr/local/lib/python2.7/dist-packages/splinter/driver/webdriver/__init__.py", line 79, in visit
self.driver.get(url)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 168, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 156, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 147, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: u'Component returned failure code: 0x804b000a (NS_ERROR_MALFORMED_URI) [nsIIOService.newURI]'
whats wrong this time?
Your problem is you do browser.quit()
inside of your loop through the URLs, so it is no longer open for the second URL.
Here's an updated version of your code:
from splinter import Browser
import socket
urls = ['http://ubuntu.com/', 'http://xubuntu.org/']
browser = None
try:
browser = Browser('firefox')
for i, url in enumerate(urls, start=1):
try:
browser.visit(url)
if browser.status_code.is_success():
browser.driver.save_screenshot('your_screenshot_%03d.png' % i)
except socket.gaierror, e:
print "URL not found: %s" % url
finally:
if browser is not None:
browser.quit()
The major change is moving the browser.quit()
code into your main exception handler's finally
, so that it'll happen no matter what goes wrong. Note also the use of enumerate
to provide both the iterator value and its index; this is the recommend approach in Python over maintaining your own index pointer.
I'm not sure if it's relevant for your code, but I found splinter
raised socket.gaierror
exceptions over urllib2.URLError
, so I showed how you could trap them as well. I moved this exception handler inside of the loop; this will continue to grab the remaining screenshots even if one or more of the URLs are non-existent.