The script that I've written to add tasks to the my Celery queue is leaking memory (to the point where the kernel kills the process after 20 minutes). In this script, I'm just executing the same 300 tasks repeatedly, every 60 seconds (inside a while True:
).
The parameters passed to the task, makeGroupRequest()
, are dictionaries containing strings, and according to hpy and objgraph, dicts and strings are also what's growing uncontrollably in memory. I've included the outputs of hpy below on successive iterations of the loop.
I've spent days on this, and I can't understand why memory would grow uncontrollably, considering nothing is re-used between loops. If I skip the retrieval of tasks, the memory doesn't appear to leak (so it's really the .get() call that is leaking memory). How can I determine what's going on and how to stop the growth?
Here is an outline of the code that's executing. I'm using the rpc:// backend.
while True:
# preparation is done here to set set up the arguments for the tasks (processedChains)
chains = []
for processedChain in processedChains:
# shorthanding
supportingData = processedChain["supportingDataAndCheckedGroups"]
# init the first element, which includes the supportingData and the first group
argsList = [(supportingData, processedChain["groups"][0])]
# add in the rest of the groups
argsList.extend([(groupInChain,) for groupInChain in processedChain["groups"][1:]])
# actually create the chain
chain = celery.chain(*[makeGroupRequest.signature(params, options={'queue':queue}) for params in argsList])
# add this to the list of chains
chains.append(chain)
groupSignature = celery.group(*chains).apply_async()
# this line appears to cause a large increase in memory each cycle
results = groupSignature.get(timeout = 2 * acceptableLoopTime)
time.sleep(60)
Here is the output of hpy
on sucessive runs:
Loop 2:
Partition of a set of 366560 objects. Total size = 57136824 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 27065 7 17665112 31 17665112 31 dict (no owner)
1 122390 33 11966720 21 29631832 52 unicode
2 89133 24 8291952 15 37923784 66 str
3 45448 12 3802968 7 41726752 73 tuple
4 548 0 1631072 3 43357824 76 dict of module
5 11195 3 1432960 3 44790784 78 types.CodeType
6 9224 3 1343296 2 46134080 81 list
7 11123 3 1334760 2 47468840 83 function
8 1414 0 1274552 2 48743392 85 type
9 1414 0 1240336 2 49983728 87 dict of type
Loop 3:
Index Count % Size % Cumulative % Kind (class / dict of class)
0 44754 9 29240496 37 29240496 37 dict (no owner)
1 224883 44 20946280 26 50186776 63 unicode
2 89104 18 8290248 10 58477024 74 str
3 45455 9 3803288 5 62280312 79 tuple
4 14955 3 2149784 3 64430096 81 list
5 548 0 1631072 2 66061168 83 dict of module
6 11195 2 1432960 2 67494128 85 types.CodeType
7 11122 2 1334640 2 68828768 87 function
8 1402 0 1263704 2 70092472 88 type
9 1402 0 1236976 2 71329448 90 dict of type
Turns out this is a bug in Celery. Switching to the memcache
backend completely resolves the memory leak. Hopefully the issue will be resolved in a subsequent version.