I am scraping two pages for a single id iteratively. First scraper works for all id's but the second one works for only one id.
class MySpider(scrapy.Spider):
name = "scraper"
allowed_domains = ["example.com"]
start_urls = ['http://example.com/viewData']
def parse(self, response):
ids = ['1', '2', '3']
for id in ids:
# The following method scraps for all id's
yield scrapy.Form.Request.from_response(response,
...
callback=self.parse1)
# The following method scrapes only for 1st id
yield Request(url="http://example.com/viewSomeOtherData",
callback=self.intermediateMethod)
def parse1(self, response):
# Data scraped here using selectors
def intermediateMethod(self, response):
yield scrapy.FormRequest.from_response(response,
...
callback=self.parse2)
def parse2(self, response):
# Some other data scraped here
I want to scrap two different pages for a single id.
Changing the following line:
yield Request(url="http://example.com/viewSomeOtherData",
callback=self.intermediateMethod)
to:
yield Request(url="http://example.com/viewSomeOtherData",
callback=self.intermediateMethod,
dont_filter=True)
worked for me.
Scrapy has a duplicate URL filter, it's possible this is filtering your Request. Try adding dont_filter = True afer the callback as suggested by Steve.