import scrapy
class rottenTomatoesSpider(scrapy.Spider):
name = "movieList"
start_urls = [
'https://www.rottentomatoes.com/'
]
def parse(self, response):
for movieList in response.xpath('//div[@id="homepage-opening-this-week"]'):
yield {
'score': response.css('td.left_col').extract_first(),
'title': response.css('td.middle_col').extract_first(),
'openingDate': response.css('td.right_col right').extract_first()
}
So the spider is instead scraping <div id='homepage-tv-top'>
I'm assuming it is the homepage-
that is confusing the script. Anyone know the workaround?
You need to iterate over each tr
and and also in for loop use movieList
instead of response
for movieList in response.xpath('//div[@id="homepage-opening-this-week"]//tr'):
yield {
'score': "".join(a for a in movieList.css('td.left_col *::text').extract()),
'title': "".join(a for a in movieList.css('td.middle_col *::text').extract()),
'openingDate': "".join(a for a in movieList.css('td.right_col *::text').extract())
}