Search code examples
python-3.xweb-scrapingscrapyscrapy-splash

How to handle DNSLookupError in Scrapy?


I am checking a bunch of website response statuses and exporting them to a CSV file. There are a couple of websites having DNSLookupError or NO WEBSITE FOUND and not storing anything in the CSV file. How can I also store the DNSLookupError message to the CSV along with the URL?

def parse(self, response):
    yield {
        'URL': response.url,
        'Status': response.status
    }

Solution

  • You can use the errback function to catch DNS errors or any other types of errors. See below sample usage.

    import scrapy
    from twisted.internet.error import DNSLookupError
    
    
    class TestSpider(scrapy.Spider):
        name = 'test'
        allowed_domains = ['example.com']
    
        def start_requests(self):
            yield scrapy.Request(url="http://example.com/error", errback=self.parse_error)
    
        def parse_error(self, failure):
            if failure.check(DNSLookupError):
                # this is the original request
                request = failure.request
                yield {
                    'URL': request.url,
                    'Status': failure.value
                }
    
    
        def parse(self, response):
            yield {
                'URL': response.url,
                'Status': response.status
            }