I have a memory leak issue with my program. I'm using muppy from the pympler library to print a list of the biggest PyObjects in my program sorted by size in bytes. Here is a reproducible example of how to use it:
$pip install pympler
import numpy as np
from pympler import muppy, summary
foo = np.random.normal(size=(1000, 1000))
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
So, I get this result:
types | # objects | total size
============================ | =========== | ============
numpy.ndarray | 42 | 7.63 MB
str | 20870 | 3.40 MB
dict | 4538 | 2.19 MB
list | 4415 | 1.37 MB
code | 5882 | 830.05 KB
type | 969 | 775.52 KB
tuple | 3821 | 278.77 KB
wrapper_descriptor | 2189 | 188.12 KB
set | 148 | 115.53 KB
builtin_function_or_method | 1442 | 112.66 KB
method_descriptor | 1406 | 109.84 KB
weakref | 1273 | 109.40 KB
abc.ABCMeta | 96 | 94.62 KB
int | 2769 | 81.01 KB
getset_descriptor | 974 | 76.09 KB
But this result aggregates objects by type. So, now I'd know that some array or arrays are causing the memory leak but I wouldn't know which of them is or are the source of the problem. I'd like to get a similar list but with the explicit name of the, let's say, first 15 biggest objects.
I've already try this for the one single biggest object:
sorted_objs = muppy.sort(all_objects)
print(sorted_objs[-1])
print(asizeof.asizeof(sorted_objs[-1]))
This actually prints the PyObject and the size in bytes. But could I also get its name? That is, "foo" in this example. Thx.
I could not find a solution for getting the names of the largest objects returned by pympler
/muppy
using these libraries. However it turns out that you can get all the info relevant to troubleshoot a memory leak using objgraph
, including variable names and object references.
Here is an illustration of how you could proceed:
Get objgraph
, for example with pip:
pip3 install objgraph
Plot the reference graph of the three largest objects
biggest_vars = muppy.sort(muppy.get_objects())[-3:]
objgraph.show_backrefs(biggest_vars, filename='backref.png')
Below is an example, where the memory leak was due to numpy arrays of zeros instantiated in another thread and referenced by two variables named newsound
and audio_data
. Somehow these arrays of zeroes were not deleted when the threads finished.