Search code examples
postgresqlcelerydjango-celeryhstore

Using Postgres hstore with celery?


Is it possible (and/or would it be effective) to use Postgres' hstore as a broker for celery?

I'm restricted (absent some very compelling reason) to using a Postgres db. I have a django app with celery tasks. Currently I am using the standard database support, but the celery docs strongly recommend against that approach for anything beyond very small task queues. I was looking into installing redis when I came across some info about the hstore feature of Postgres, and the suggestion that it provides equivalent functionality to redis.

I haven't seen anything about using hstore specifically for celery, though, which seems odd if it really can substitute for redis. Looking through the celery backend code at
https://github.com/celery/celery/blob/master/celery/backends/base.py
it looks like the base celery KeyValueStoreBackend is a pretty simple api:

def get(self, key):
    raise NotImplementedError('Must implement the get method.')

def mget(self, keys):
    raise NotImplementedError('Does not support get_many')

def set(self, key, value):
    raise NotImplementedError('Must implement the set method.')

def delete(self, key):
    raise NotImplementedError('Must implement the delete method')

def incr(self, key):
    raise NotImplementedError('Does not implement incr')

but before I potentially pour a bunch of time into this it seemed worth asking whether there's something I'm missing that would argue against implementing this API using hstore and using that as a celery backend.

eg. Does celery have requirements that aren't captured by this API (eg. atomicity, scalability, reliability under load)? Would implementing this using hstore fail to provide a substantial improvement over the existing database backend? I'm fairly new to celery and never used hstore, so I'm not sure what (if anything) I'm overlooking.


Solution

  • hstore absolutely does not provide "equivalent functionality to redis".

    A hstore field is not a key-value-DB in a field. Trying to use it that way will lead to pain and terrible performance. The whole record containing the hstore field must be re-written for every update. Additionally, the same challenges as apply with task queuing in a relational DB apply with hstore, meaning that you'll get at best the performance of a single worker, you won't get concurrency even though it might superficially look like you do.

    All hstore is is a hash-map in a database field. It's very useful, but it's not magic, and it won't free you from the underlying challenges of using a RDBMS for message queuing.

    If you want a message queue, use a message queue. PGQ is one good option. Alternately check out dedicated message queue tools like ZeroMQ.