Search code examples
pythonmemory-leaks

muppy: Getting the name of the biggest PyObjects in a Python program sorted by size


I have a memory leak issue with my program. I'm using muppy from the pympler library to print a list of the biggest PyObjects in my program sorted by size in bytes. Here is a reproducible example of how to use it:

$pip install pympler

import numpy as np
from pympler import muppy, summary

foo = np.random.normal(size=(1000, 1000))
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)

So, I get this result:

                       types |   # objects |   total size
============================ | =========== | ============
               numpy.ndarray |          42 |      7.63 MB
                         str |       20870 |      3.40 MB
                        dict |        4538 |      2.19 MB
                        list |        4415 |      1.37 MB
                        code |        5882 |    830.05 KB
                        type |         969 |    775.52 KB
                       tuple |        3821 |    278.77 KB
          wrapper_descriptor |        2189 |    188.12 KB
                         set |         148 |    115.53 KB
  builtin_function_or_method |        1442 |    112.66 KB
           method_descriptor |        1406 |    109.84 KB
                     weakref |        1273 |    109.40 KB
                 abc.ABCMeta |          96 |     94.62 KB
                         int |        2769 |     81.01 KB
           getset_descriptor |         974 |     76.09 KB

But this result aggregates objects by type. So, now I'd know that some array or arrays are causing the memory leak but I wouldn't know which of them is or are the source of the problem. I'd like to get a similar list but with the explicit name of the, let's say, first 15 biggest objects.

I've already try this for the one single biggest object:

sorted_objs = muppy.sort(all_objects)
print(sorted_objs[-1])
print(asizeof.asizeof(sorted_objs[-1]))

This actually prints the PyObject and the size in bytes. But could I also get its name? That is, "foo" in this example. Thx.


Solution

  • I could not find a solution for getting the names of the largest objects returned by pympler/muppy using these libraries. However it turns out that you can get all the info relevant to troubleshoot a memory leak using objgraph, including variable names and object references.

    Here is an illustration of how you could proceed:

    1. Get objgraph, for example with pip:

      pip3 install objgraph
      
    2. Plot the reference graph of the three largest objects

      biggest_vars = muppy.sort(muppy.get_objects())[-3:]
      objgraph.show_backrefs(biggest_vars, filename='backref.png')
      

    Below is an example, where the memory leak was due to numpy arrays of zeros instantiated in another thread and referenced by two variables named newsound and audio_data. Somehow these arrays of zeroes were not deleted when the threads finished.

    dependency graph to troubleshoot memory leak