I use Scrapy framework to crawl data. My crawler will be interrupted if it encounters a 500 error. So I need to check an available link before I parse a web content.
Is there any approach to resolve my problem?
Thank you so much.
If the url exists you could use the getcode() method of urllib to check it:
import urllib
import sys
webFile = urllib.urlopen('http://www.some.url/some/file')
returnCode = webFile.getCode()
if returnCode == 500:
sys.exit()
# in other case do something.