Search code examples
pythonhtmlweb-scrapingwebscrapy

Getting error when trying to scrape items from flipkart using python


Can you please tell what might be the error in it? I am trying to scrape items from flipkart

import scrapy

class flipkart_scrapy(scrapy.Spider):
    name = 'flipkart'
    urls = ['https://www.flipkart.com/televisions/pr?sid=ckf%2Cczl&p%5B%5D=facets.brand%255B%255D%3DMi&otracker=categorytree&p%5B%5D=facets.serviceability%5B%5D%3Dtrue&p%5B%5D=facets.availability%255B%255D%3DExclude%2BOut%2Bof%2BStock&otracker=nmenu_sub_TVs%20%26%20Appliances_0_Mi']
    base_url = urls[0]
    page_no = 2
    next_page = base_url + '&page=' + str(page_no)

    def parse(self, response):
        for product in response.css("div._2kHMtA"):
            yield {
                'name': product.css("div._4rR01T::text").get(),
                'price': product.css('div._30jeq3._1_WHN1::text').get(),
                'rating': product.css("div._3LWZlK::text").get(),
            }

        if self.next_page is not None:
            yield response.follow(self.next_page, callback=self.parse)
            self.page_no += 1
            self.next_page = self.base_url + '&page=' + str(self.page_no)

That is the code I'm trying run: scrapy crawl flipkart

Can you please tell what might be the error in it? I am trying to scrape items from flipkart it is not scraping anything


Solution

  • Your spider doesn't do anything because you don't have start_requests or start_urls defined.

    From the scrapy API documentation for scrapy.Spider:

    This is the simplest spider, and the one from which every other spider must inherit (including spiders that come bundled with Scrapy, as well as spiders that you write yourself). It doesn’t provide any special functionality. It just provides a default start_requests() implementation which sends requests from the start_urls spider attribute and calls the spider’s method parse for each of the resulting responses.

    All you need to do to fix this would be to change your spiders urls attribute to be called start_urls. Or override the start_requests method.

    For example:

    class flipkart_scrapy(scrapy.Spider):
        name = 'flipkart'
        start_urls = [...]  #   <---- This changes to start_urls