We have a Django webapp that uses Celery 4.x to run tasks asynchronously. The main tasks require the Django/Celery code to perform network communication operations with 20-100 other servers. Each request we send to these other servers is identical, ie the user sends a command to Django which then tells Celery to send the exact same command to each of the 20-100 other servers. The problem is that with the basic configuration of Celery, if we have say 4 workers then Celery will communicate with only 4 servers at time. To fix this problem, we tried using Celery with gevent. However, gevent uses coroutines rather then full threads, and for our network operations we are using our own python module that was written in C. In other words we are NOT using the python socket or request modules that can be monkey patched.
So, what we would like to do is the following. For sake of argument, let's say our C communication module is called "cnet". If we have 20 other servers that we must communicate with, we would have a Celery task function that does something like this:
# This uses our cnet module (written in C) to connect to a single other server
def connect_to_server(server, user_data):
response=cnet.execute_request(server,user_data)
output=do_something_with_response(response)
return output
@task
def send_something_important_to_all_servers(dest_servers, user_data):
for server in dest_servers:
t = create new thread to run connect_to_server(server, user_data)
t.start()
wait/join on all threads
return
I have a number of questions on how we can implement this. Initially we used the prefork pool and multiple workers, one task per worker but that didn't scale. Next we used gevent, but we didn't do anything special, just launched celery with 4 workers, each worker with some large number of greenlets, much like what was done here: https://groups.google.com/forum/?fromgroups=#!topic/celery-users/RNZLiNyykQQ That of course still showed the same problem, if we had 4 workers then we only got 4 things to run at the same time.
Now I have read that we can use the eventlet pool, and that has something called tpool that we can use inside a task to spawn multiple threads. The docs say that this is particularly useful for situations like ours where we are using a native C network module that cannot be monkey patched. So is this the best approach for us. ie using eventlet with tpool? Is there any reason to use gevent rather than eventlet in our situation, and if so, is there a gevent equivalent to tpool? Does anyone have an example of celery code that handles a situation like this, i.e. that use non-native networking code that cannot be monkey patched?
Your best option is to do multiplexing in C, present it to Celery as a single function that does all requests. There is also a variety of approaches for doing this in C, OS threads is but one. Pick what you know best, what you're most comfortable with.
Eventlet tpool will work, be aware, its size is only 20 by default, you may wish to increase.
Monkey patching will still work if you use python socket module from inside your C module. That's probably the cheapest and fastest way.
Update: eventlet.tpool()
yields. It's not explicitly written in documentation, because any blocking API defeats the purpose of library. Any number of coroutines (smaller or greater than tpool size) will work as expected. Tpool size only limits number of OS threads running at the same time.