Search code examples

Scrapy not returning all the items it should

I'm trying to get Scrapy to crawl through a website, but constrain it only to pages that match a certain pattern, and it's giving me a headache.

The website is structured like this:

And so on.

I need it to start crawling from category and then follow all the links that lead to another page (there ar 375 pages total, and the number is not fixed, of course).

The problem is that it crawls through ~10 pages before I stop it, but it only returns 10-15 items, where there should be 200+

Here is my code, which doesn't work right:

class WSSpider(CrawlSpider):
name = "ws"
allowed_domains = [""]
start_urls = [""]
rules = (
    Rule(LinkExtractor(allow=("/level_one/page*",)), callback="parse_product", follow=True),

    def parse_product(self, response):
        sel = Selector(response)
        sites = sel.css(".pb-infos")
        items = []

        for site in sites:
            item = Website()
            item["brand"] = site.css(".pb-name .pb-mname::text").extract()
            item["referinta"] = site.css(".pb-name a::text").extract()
            item["disponibilitate"] = site.css(".pb-availability::text").extract()
            item["pret_vechi"] = site.css(".pb-sell .pb-old::text").extract()
            item["pret"] = site.css(".pb-sell .pb-price::text").extract()
            item["procent"] = site.css(".pb-sell .pb-savings::text").extract()

        #return items
        f = open("output.csv", "w")
        for item in items:
            line = \
                item["brand"][0].strip(), ";", \
                item["referinta"][-1].strip(), ";", \
                item["disponibilitate"][0].strip(), ";", \
                item["pret_vechi"][0].strip().strip(" lei"), ";", \
                item["pret"][0].strip().strip(" lei"), ";", \
                item["procent"][0].strip().strip("Mai ieftin cu "), "\n"

Any help is much appreciated!


  • I found my (stupid) mistake.

    f = open("output.csv", "w")

    should in fact be

    f = open("output.csv", "a")