Search code examples
pythonweb-scrapingscrapyscrapy-splash

Scrapy program is not scraping all data


I am writing a program in scrapy to scrape following page, https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066, and it is only scraping the first line of data and not the rest. I think it has something to do with my for loop but when I change the loop to be broader it outputs too much data, as in it output each line of data multiple times.

 def parse(self, response):
        item = GameItem()
        saved_name = ""
        for game in response.css("div.row.mt-1.list-view"):
            saved_name  = game.css("a.card-text::text").get() or saved_name
            item["Card_Name"] = saved_name.strip()
            if item["Card_Name"] != None:
                saved_name = item["Card_Name"].strip()
            else:
                item["Card_Name"] = saved_name
            yield item

UPDATE #1



    def parse(self, response):
        for game in response.css('div.card > div.row'):
            item = GameItem()
            item["Card_Name"]  = game.css("a.card-text::text").get()
            for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
                item["Condition"] = game.css("div.col-3.text-center.p-1::text").get()
                item["Price"] = game.css("div.col-2.text-center.p-1::text").get()
            yield item

Sample Output


Solution

  • I think you need below CSS (later you can use it as a base to process buying-options container):

     def parse(self, response):
            for game in response.css('div.card > div.row'):
                item = GameItem()
                Card_Name  = game.css("a.card-text::text").get()
                item["Card_Name"] = Card_Name.strip()
                for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
                    # process buying-option
                    # may be you need to move GameItem() initialization inside this loop
    
                yield item
    

    As you can see I moved item = GameItem() inside a loop. Also there is no need in saved_game here.