I'm building a web service for iterative batch processing of data using CherryPy. The ideal workflow is as follows:
The key consideration here is that the processing should run as fast as possible with each iteration starting as soon as the previous one finishes, regardless of the amount of data in the queue. There's no upper bound on how long each iteration can take so I can't create a fixed schedule for it to run on.
There are a few examples of using BackgroundTask
(like this one) but I've yet to find one that deals with returning data, or one that deals with tasks running as fast as possible as opposed to on a fixed schedule.
I'm not wedded to the BackgroundTask
solution so if anyone can offer an alternative one I'd be more than happy. It feels like there's a solution within the framework though.
Don't run a background task using the BackgroundTask
solution, because it will run in a thread and, due to the GIL, cherrypy won't be able to answer new requests. Use a queue solution that runs your background tasks in a different process, like Celery or RQ.
I'm going to develop in detail an example using RQ. RQ uses Redis as a message broker, so first of all you need to install and start Redis.
Then create a module (mytask
in my example) with the long time running background methods:
import time
def long_running_task(value):
time.sleep(15)
return len(value)
Start one (or more than one if you want to run tasks in parallel) RQ workers, it's important that the python that is running your workers has access to your mytask
module (export the PYTHONPATH before running the worker if your module it's not already in the path):
# rq worker
Above you have a very simple cherrypy webapp that shows how to use the RQ queue:
import cherrypy
from redis import Redis
from rq import Queue
from mytask import long_running_task
class BackgroundTasksWeb(object):
def __init__(self):
self.queue = Queue(connection=Redis())
self.jobs = []
@cherrypy.expose
def index(self):
html = ['<html>', '<body>']
html += ['<form action="job">', '<input name="q" type="text" />', '<input type="submit" />', "</form>"]
html += ['<iframe width="100%" src="/results" />']
html += ['</body>', '</html>']
return '\n'.join(html)
@cherrypy.expose
def results(self):
html = ['<html>', '<head>', '<meta http-equiv="refresh" content="2" >', '</head>', '<body>']
html += ['<ul>']
html += ['<li>job:{} status:{} result:{} input:{}</li>'.format(j.get_id(), j.get_status(), j.result, j.args[0]) for j in self.jobs]
html += ['</ul>']
html += ['</body>', '</html>']
return '\n'.join(html)
@cherrypy.expose
def job(self, q):
job = self.queue.enqueue(long_running_task, q)
self.jobs.append(job)
raise cherrypy.HTTPRedirect("/")
cherrypy.quickstart(BackgroundTasksWeb())
In a production webapp I would use jinja2 template engine to generate the html, and most likely websockets to update the job status in the web browser.