Scrapy scraping all pages in domain

I'm tearing my hair out trying to get scrapy to look at all pages in a domain. I tried the rules = [Rule(LinkExtractor(), callback='parse_item', follow=True)] way (instead of the dragon_start function in my code), and didn't get anywhere. Now I'm trying to extract all links and iterate on that list, and that's not working either. What am I failing to do! Copilot isn't helping, and I looked at pretty much all the other SO posts...

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy import Request

class DvSpider(CrawlSpider):
    name = "dvspider"
    start_urls = [""]
    allowed_domains = [""]

    def dragon_start(self, response):
        links = response.css('a::attr(href)').extract()
        for link in links:
            yield Request(response.urljoin(link), self.parse_item)

    def parse_item(self, response):
        if response.url[-7] == '_':
            dragon = response.css('table.dragonbox')

            if dragon:
                rows = dragon.xpath('//tr')
                yield {
                    'DragonName': rows[0].css('b::text').get().strip(),

I don't get any errors, and scrapy crawls start_urls with a (200) code. But then the spider immediately says INFO: Closing spider (finished).


  • I think you are missing the start_requests entry point.

    For example,

    from typing import Iterable
    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor
    from scrapy import Request
    class DvSpider(CrawlSpider):
        name = "dvspider"
        start_urls = [""]
        allowed_domains = [""]
        def start_requests(self):
            yield Request(self.start_urls[0], self.dragon_start)
        def dragon_start(self, response):
            links = response.css('a::attr(href)').extract()
            for link in links:
                yield Request(response.urljoin(link), self.parse_item)
        def parse_item(self, response):
            if response.url[-7] == '_':
                dragon = response.css('table.dragonbox')
                if dragon:
                    rows = dragon.xpath('//tr')
                    yield {
                        'DragonName': rows[0].css('b::text').get().strip(),

    This should call the dragon_start function and start iterating the links from there.