python-3.x web-scraping scrapy scrapy-splash

How to handle DNSLookupError in Scrapy?

I am checking a bunch of website response statuses and exporting them to a CSV file. There are a couple of websites having DNSLookupError or NO WEBSITE FOUND and not storing anything in the CSV file. How can I also store the DNSLookupError message to the CSV along with the URL?

def parse(self, response):
    yield {
        'URL': response.url,
        'Status': response.status
    }

Solution

You can use the errback function to catch DNS errors or any other types of errors. See below sample usage.

import scrapy
from twisted.internet.error import DNSLookupError


class TestSpider(scrapy.Spider):
    name = 'test'
    allowed_domains = ['example.com']

    def start_requests(self):
        yield scrapy.Request(url="http://example.com/error", errback=self.parse_error)

    def parse_error(self, failure):
        if failure.check(DNSLookupError):
            # this is the original request
            request = failure.request
            yield {
                'URL': request.url,
                'Status': failure.value
            }


    def parse(self, response):
        yield {
            'URL': response.url,
            'Status': response.status
        }