I have a view in my django project that fires off a celery task. The celery task itself triggers a few map/reduce jobs via subprocess/fabric and the results of the hadoop job are stored on disk --- nothing is actually stored in the database. After the hadoop job has been completed, the celery task sends a django signal that it is done, something like this:
# tasks.py
from models import MyModel
import signals
from fabric.operations import local
from celery.task import Task
class Hadoopification(Task):
def run(self, my_model_id, other_args):
my_model = MyModel.objects.get(pk=my_model_id)
self.hadoopify_function(my_model, other_args)
signals.complete_signal.send(
sender=self,
my_model_id=my_model_id,
complete=True,
)
def hadoopify_function(self, my_model, other_args):
local("""hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapred.reduce.tasks=0 -file hadoopify.py -mapper "parse_mapper.py 0 0" -input /user/me/input.csv -output /user/me/output.csv""")
What is truly baffling me is that the django runserver is reloading when the celery task is run, as if I had changed some code somewhere in the django project (which I have not, I can assure you!). From time to time, this even causes errors in the runserver command where I see output like the following before the runserver command reloads and is ok again (note: this error message is very similar to the problem described here).
Unhandled exception in thread started by <function inner_run at 0xa18cd14>
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
if not enabled():
TypeError: 'NoneType' object is not callable
Original exception was:
Traceback (most recent call last):
File "/home/rdm/Biz/Projects/Daegis/Server_Development/tar/env/lib/python2.6/site-packages/django/core/management/commands/runserver.py", line 60, in inner_run
run(addr, int(port), handler)
File "/home/rdm/Biz/Projects/Daegis/Server_Development/tar/env/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 721, in run
httpd.serve_forever()
File "/usr/lib/python2.6/SocketServer.py", line 224, in serve_forever
r, w, e = select.select([self], [], [], poll_interval)
AttributeError: 'NoneType' object has no attribute 'select'
I've narrowed the problem down to when calls are made to hadoop by replacing local("""hadoop ...""")
with local("ls")
which does not cause any problems with reloading the django runserver. There are no bugs in the hadoop code --- it runs just fine on its own when its not called by celery.
Any idea of what might be causing this?
There is some discussion about this on the fabric github page here, here and here. Another option for raising an error is to use the settings context manager:
from fabric.api import settings
class Hadoopification(Task):
...
def hadoopify_function(self, my_model, other_args):
with settings(warn_only=True):
result = local(...)
if result.failed:
# access result.return_code, result.stdout, result.stderr
raise UsefulException(...)
This has the advantage of allowing access to the return code and all of the other attributes on the result.