Search code examples
pythongoogle-app-enginequeuedeferred

defer many tasks simultaneously on google-app-engine


I am developping a python app on google app engine. I have a CRON job which imports everyday a list of 20 fresh files from a S3 bucket to a GS bucket.

Here is my code:

import webapp2
import yaml
from google.appengine.ext import deferred

class CronTask(webapp2.RequestHandler):

    def get(self):
        with open('/my/config/file') as file:
            config_dict = yaml.load(file_config_file)
        for file_to_load in config_dict:
            deferred.defer(my_import_function, file_to_load)


app = webapp2.WSGIApplication([
    ('/', CronTask)
], debug=True)

Note that my_import_function is part of another package and takes some time to be done.

My question: is it a good idea to use the function deferred.defer for this task or should I proceed diferently to launch my_import_function for all my arguments?


Solution

  • You should use the taskqueue, but depending on how many tasks you have you may not want to use deferred.defer().

    With deferred.defer() you can only enqueue one task per call. If you are enqueueing a lot of tasks, that is really inefficient. This is really slow:

    for x in some_list:
        deferred.defer(my_task, x)
    

    With a lot of tasks, it is much more efficient to do something like this:

    task_list = []
    for x in some_list:
        task_list.append(taskqueue.Task(url="/task-url",params=dict(x=x)))
    taskqueue.Queue().add(task_list)
    

    About a year ago, I did a timing comparison, and the latter was at least an order of magnitude faster than the former.