python requests memory usage on heroku

Some observations on Heroku that don't completely mesh with my mental model.

My understanding is that CPython will never release memory once it has been allocated by the OS. So we should never observe a decrease in resident memory of CPython processes. And this is in fact my observation from occasionally profiling my Django application on Heroku; sometimes the resident memory will increase, but it will never decrease.

However, sometimes Heroku will alert me that my worker dyno is using >100% of its memory quota. This generally happens when a long-running response-data-heavy HTTPS request that I make to an external service (using the requests library) fails due to a server-side timeout. In this case, memory usage will spike way past 100%, then gradually drop back to less than 100% of quota, when the alarm ceases.

My question is, how is this memory released back to the OS? AFAIK it can't be CPython releasing it. My guess is that the incoming bytes from the long-running TCP connection are being buffered by the OS, which has the power to de-allocate. It's murky to me when exactly "ownership" of TCP bytes is transferred to my Django app. I'm certainly not explicitly reading lines from the input stream, I delegate all of that to requests.

Solution

Apparently, at one time, CPython did NOT ever release memory back to the OS. Then a patch was introduced in Python 2.5 that allowed memory to be released under certain circumstances, detailed here. Therefore it's no longer true to say that python doesn't release memory; it's just that python doesn't often release memory, because it doesn't handle memory fragmentation very well.

At a high-level, python keeps track of its memory in 256K blocks called arenas. Object pools are held in these arenas. Python is smart enough to free the arenas back to the OS when they're empty, but it still doesn't handle fragmentation across the arenas very well.

In my particular circumstance, I was reading large HTTP responses. If you dig down the code chain starting with HttpAdapter.send() in the requests library, you'll eventually find that socket.read() in the python socket library is making a system call to receive from its socket in chunks of 8192 bytes (default buffer size). This is the point at which the OS copies bytes from the kernel to the process, where they will be designated by CPython as string objects of size 8K and shoved into an arena. Note that StringIO, which is the python-land buffer object for sockets, simply keeps a list of these 8K strings rather than mushing them together into a super-string object.

Since 8K fits precisely 32 times into 256K, I think what is happening is that the received bytes are nicely filling up entire arenas without much fragmentation. These arenas can then be freed to the OS when the 8K strings filling them are deleted.

I think I understand why the memory is released gradually (asynchronous garbage collection?), but I still don't understand why it takes so long to release after a connection error. If the memory release always took so long, I should be seeing these memory usage errors all the time, because my python memory usage should spike whenever one of these calls are made. I've checked my logs, and I can sometimes see these violations last for minutes. Seems like an insanely long interval for memory release.

Edit: I have a solid theory on the issue now. This error is being reported to me by a logging system that keeps a reference to the last traceback. The traceback maintains a reference to all the variables in the frames of the traceback, including the StringIO buffer, which in turn holds references to all the 8K-strings read from socket. See the note under sys.exc_clear(): This function is only needed in only a few obscure situations. These include logging and error handling systems that report information on the last or current exception.

Therefore, in exception cases, the 8K-string ref counts don't drop to zero and immediately empty their arenas as they would in the happy path; we have to wait for background garbage collection to detect their reference cycles.

The GC delay is compounded by the fact that when this exception occurs, lots of objects are allocated over 5 minutes until timeout, which I'm guessing is plenty of time for lots of the 8K-strings to make it into the 2nd generation. With default GC thresholds of (700, 10, 10), it would take roughly 700*10 allocations for string objects to make it into the 2nd generation. That comes out to 7000*8192 ~= 57MB, which means that all the strings received before the last 57MB of the bytestream make it into the 2nd gen, maybe even 3rd gen if 570MB is streamed (but that seems high).

Intervals on the order of minutes still seem awfully long for garbage collection of the 2nd generation, but I guess it's possible. Recall that GC isn't triggered only by allocations, the formula is actually trigger == (allocations - deallocations > threshold).

TL;DR Large responses fill up socket buffers that fill up arenas without much fragmentation, allowing Python to actually release their memory back to the OS. In unexceptional cases, this memory will be released immediately upon exit of whatever context referenced the buffers, because the ref count on the buffers will drop to zero, triggering an immediate reclamation. In exceptional cases, as long as the traceback is alive, the buffers will still be referenced, therefore we will have to wait for garbage collection to reclaim them. If the exception occurred in the middle of a connection and a lot of data was already transmitted, then by the time of the exception many buffers will have been classified as members of an elder generation, and we will have to wait even longer for garbage collection to reclaim them.