I have the following while loop to scrap pages
def after_login(self, response):
i=100000
while (i<2000000):
yield scrapy.Request("https://www.example.com/foobar.php?nr="+str(i),callback=self.another_login)
i+=1
The problem is that the process gets killed because of stack overflow. Is there a way to tell the while loop to queue 1000 requests and when those are done to queue another 1000?
You should play around with Scrapy settings. For instance, try decreasing the CONCURRENT_REQUESTS
, adding DOWNLOAD_DELAY
.
If that does not help, look into debugging the memory usage, see also: