Search code examples
pythonhtmlscrapyrotten-tomatoes

Scrapy spider not scraping correct div


import scrapy
class rottenTomatoesSpider(scrapy.Spider):
    name = "movieList"
    start_urls = [
         'https://www.rottentomatoes.com/'
    ]

def parse(self, response):
    for movieList in response.xpath('//div[@id="homepage-opening-this-week"]'):
        yield {
           'score': response.css('td.left_col').extract_first(),
           'title': response.css('td.middle_col').extract_first(),
           'openingDate': response.css('td.right_col right').extract_first()
        }

So the spider is instead scraping <div id='homepage-tv-top'>

I'm assuming it is the homepage- that is confusing the script. Anyone know the workaround?


Solution

  • You need to iterate over each tr and and also in for loop use movieList instead of response

    for movieList in response.xpath('//div[@id="homepage-opening-this-week"]//tr'):
        yield {
           'score': "".join(a for a in movieList.css('td.left_col *::text').extract()),
           'title': "".join(a for a in movieList.css('td.middle_col *::text').extract()),
           'openingDate': "".join(a for a in movieList.css('td.right_col *::text').extract())
        }