Search code examples
pythonpython-2.7memorytensorflowheapy

Python heapy shows constant memory usage, though 60GB of ram is exhausted in 10min


I'm running a tensorflow model which is exhausting 60G of RAM in about 10 minutes while processing large images.

I've run Heapy to try to pin down a leak, but heapy shows only 90M of memory usage and remains constant.

I noted this article: Python process consuming increasing amounts of system memory, but heapy shows roughly constant usage

That suggested that the issue might be in python (2.7 here) with memory fragmentation. But that doesn't sound like a reasonable explanation for this case.

  • I have 2 python Queues. In one I read an image from disk and load it to the raw queue using a thread.
  • In another thread I read the raw queue, preprocess, and load it into a ready queue.
  • In my main thread I draw batches of 8 images from the ready queue and run them through tensorflow training.
  • With batches of 8 images (each ~25MB numpy matrices) I should have at least 24 * 25MB worth of memory being held between current processing and the two queues at any given time. But heapy only shows 90M of consumption.

So heapy is failing to see at least the 600M of memory that I know must be held at any given moment.

Hence, if heapy can't see the memory I know is there, I can't trust it to see where the leak is. At the rate it's leaking it's a virtual certainty that the batches of images are causing it.

I'm using the threading module in python to kick off the loader and preprocessor threads. I've tried calling print h.heap() from within the threads code and the main code, all with the same results.


Solution

  • I ended up having an unbounded python Queue by accident. Simple fix. Weird that heapy didn't show memory that was allocated by the Queue. Well, memory_profiler did, and thus I tracked down the issue.

    It sure would have been a beautiful thing if heapy had said, "hey, there's this Queue object using more memory than you were expecting."