triggering a function after the finish of specific Request in scrapy

I have a complex scraping application in Scrapy that run at multiple stages (each stage is a function calling the next stage of scraping and parsing). the spider try to download multiple targets and each target consists of large number of files. what i need to do is after downloading all the files of a target is calling some function that process them and it cannot process them partially it needs the whole set of files for the target at the same time. is there a way to do it ?

Solution

If you cannot wait until the whole spider is finished, you will have to write some logic in an item pipeline that keeps track of what you have scraped, and executes a function then. Below is some logic to get you started: it keeps track of the number items you scraped per target, and when it reaches 100, it will execute the target_complete method. Note that you will have to fill in the field 'target' in the item of course.

from collections import Counter

class TargetCountPipeline(object):
    def __init__(self):
        self.target_counter = Counter()
        self.target_number = 100

    def process_item(self, item, spider):
        target = item['target']
        self.target_counter[target] += 1
        if self.target_counter[target] >= self.target_number:
            target_complete(target)
        return item

    def target_complete(self, target):
        # execute something here when you reached the target