we are currently using Redis to Go with our Heroku-hosted Python application.
We use Redis with python-rq purely as a task queue to provide for a delayed execution of some time-intense tasks. A task is retrieving some data from a PostgreSQL database and is writing the results back to it - thus no valuable data is saved at all in the Redis Instance. We notice that, depending on the amount of jobs executed, Redis is consuming more and more memory (growth @ ~10 MB/hour). A FLUSHDB command on the CLI fixes this (takes it down to ~700kB of RAM used) until RAM is full again.
According to our (unchanged standard) settings, a job result is kept for 500 seconds. Over time, some jobs of course fail, and they are moved to the failed queue.
Sorry for the pretty noobish questions, but I'm new to the topic of queuing stuff and after researching for 2+ days I've reached a point where I don't know that to do next. Thanks, KH
After two more days of playing around, I have found the problem. I would like to share this with you, along with the tools that were helpful:
Core Problem
The actual problem was that we had overlooked to cast an object to a string before saving it to the PostgreSQL database. Without this cast, the string representation ended up in the DB (due to the __str__()
function of the respective object returning exactly the representation we wanted); however, to Redis, the whole object was passed. After passing it to Redis, the associated task crashed with an UnpickleError
exception. This consumed 5 MB RAM that were not freed up after the crash.
Additional Actions
To reduce memory footprint further, we implemented the following supplementary actions (mind that we are saving everything to a separate DB so the results that Redis saves are not used at all in our application):
enqueue_call([...] result_ttl=0)
black_hole
- to take all exceptions and return False. This prevents Redis from moving a task to the failed queue where it would still use a bit of memory. Exceptions are beforehand sent via e-mail to us to keep track of them.Useful tools along the way:
We just worked with redis-cli
.
redis-cli info | grep used_memory_human
--> shows current memory usage. ideal to compare memory footprint before and after a task was executed.redis-cli keys '*'
--> shows all current keys that exist. This overview led me to the insight that some tasks are not deleted even though they should have been (as written above, they crashed with an UnpickleError and because of this were not removed).redis-cli monitor
--> shows a realtime overview of what is happening in Redis. This helped me find out that the objects that were moved back and forth were too massive.redis-cli debug object <key>
--> shows a dump of the key's value.redis-cli hgetall <key>
--> shows a more readable dump of the key's value (especially useful for the specific use case of using Redis purely as task queue, since it seems that the tasks are created by python-rq in this format.Furthermore, I can answer some of the questions I had posted above:
From the docs I know that the 500 sec TTL means that a key is then "expired", but not really deleted. Does the key still consume memory at this point? Can I somehow change this behavior?
Actually, they are deleted, just as the docs imply.
Does it have something to do with the failed queue (which apparently does not have a TTL attached to the jobs, meaning (I think) that these are kept forever)?
Surprisingly, the jobs for which Redis itself crashed were not moved to the Failed Queue, they were just "abandoned", meaning the values remained but RQ didn't care about it the normal way it does with failed jobs.
Relevant Documentation