Search code examples
pythoncmemory-leaksgtkpygobject

Memory leak in my PyGObject project - but seems to be outside of Python


(rewritten after new findings)

I'm having a memory leak in a piece of PyGObject code that I try to understand.

It's reproducible, at least here (Debian stable), with the following code:

import gi
gi.require_version("Gtk", "3.0")
from gi.repository import GLib
from gi.repository import Gtk


def foo(labels):
    for label in labels:
        label.props.label = 100000 * "x"
    GLib.timeout_add(1000, foo, labels)


def main():
    labels = []
    box = Gtk.Box(visible=True)
    for _ in range(40):
        label = Gtk.Label(visible=False)
        labels.append(label)
        box.add(label)
    window = Gtk.Window(visible=True)
    window.add(box)
    GLib.idle_add(foo, labels)
    GLib.MainLoop().run()


if __name__ == "__main__":
    main()

You could probably adapt some numbers, but this way it is quite visible soon what happens to the memory consumption. When I start it, it reaches 2GB in less than 8 minutes.

There is nothing visible in the guppy memory profiler. I can make heap snapshots at any time, and they always look more or less the same. Also heapu() for unreachable-but-alive objects does not show suspicious stuff.

The same happens in other variants as well. Here e.g. with threading instead (and with just one label):

def foo(label):
    label.props.label = 1000000 * "x"


def bar(label):
    while True:
        time.sleep(1)
        GLib.idle_add(foo, label)


def main():
    box = Gtk.Box(visible=True)
    label = Gtk.Label(visible=False)
    box.add(label)
    window = Gtk.Window(visible=True)
    window.add(box)
    import threading
    threading.Thread(target=bar, args=(label,)).start()
    GLib.MainLoop().run()

It does not happen when I use .set_label() instead of .props.label = ....

update:

I've reported it: https://gitlab.gnome.org/GNOME/pygobject/-/issues/520

original text was:

I'm again having a memory leak in my PyGObject project. But it appears to be outside of Python.

I have a tiny GTK application. It creates a window and a main loop as usual, then calls a function inside the main loop using GLib.idle_add. This function removes all widgets from the window, creates a few Gtk.Labels (with some text; e.g. a random number), puts them in a Gtk.Box, and puts that box into the window. Then it uses GLib.timeout_add to schedule its own invocation again one millisecond later.

So it constantly creates new stuff and removes the old one, but always showing the same amount (and kind) of widgets. I would expect to see constant memory consumption.

That application solely exists so I can analyze a particular memory leak. It's not easy to trim down the code further for posting it, I suspect. The actual code does all that in a much more indirect and complicated way. :) I know that it might be more straight-forward to just create the UI elements once, then update them. But that's not the point of that experiment.

When I run it, the process memory consumption is 40MB, and steadily climbs. I've seen it climbing up to 2GB, but then stopped it. When I use guppy for memory profiling, everything looks fine. Calling guppy's heap() always shows about the same heap content. No leak. And the same for heapu(), i.e. for unreachable but alive objects. No leak.

But in the end, it's leaking somewhere. So, am I doing something that leads to leaks on GTK side only?!

The question:

What could be good ways to analyze that issue further? I'm rather unexperienced when it comes to all the C stuff, and gdb, valgrind, and whatever, and if/how that could interact with my Python code. Any hint is appreciated.


Solution

  • I you have access to a Linux system you can use https://github.com/vmware/chap for such problems, as is seen in this case.

    It is free open source and is well suited to your program because it recognizes both Python allocations and ones done via libc malloc.

    In any case, at least your first program is readily reproduced on Linux.

    I cut and pasted your code into a file called usegtk.py and started your program in background. The process ran with pid 158483:

    $ python3 usegtk.py &
    [1] 158483
    

    To make sure that the core had sufficient information for chap to gather mangled symbol names I set the coredump_filter for pid 158483 to 0x37:

    $ echo 0x37 >/proc/158483/coredump_filter
    

    Then I waited a bit, gathered a core for that process using gcore, and killed the process because it is known to be growing quickly because I didn't need the process anymore.

    Then I opened the core in chap and waited for the chap prompt:

    $ chap core.158483
    chap> 
    

    Use of summarize writable from the chap prompt showed that the biggest use of memory was for "libc malloc main arena pages" and that much less memory was used for "python arena".

    chap> summarize writable
    1 ranges take 0x2a847000 bytes for use: libc malloc main arena pages
    95 ranges take 0x100f2000 bytes for use: used by module
    3 ranges take 0x1800000 bytes for use: used pthread stack
    19 ranges take 0x4c0000 bytes for use: python arena
    3 ranges take 0x1a4000 bytes for use: libc malloc mmapped allocation
    3 ranges take 0x63000 bytes for use: libc malloc heap
    1 ranges take 0x21000 bytes for use: main stack
    15 ranges take 0x1d000 bytes for use: unknown
    140 writable ranges use 0x3c83e000 (1,015,275,520) bytes.
    

    Use of count used and count leaked from the chap prompt showed that used allocations, and more specifically leaked allocations, dominated the memory usage:

    chap> count used
    78609 allocations use 0x2ab84248 (716,718,664) bytes.
    chap> count leaked
    7056 allocations use 0x271c52c0 (656,167,616) bytes.
    

    Use of summarize leaked showed that by far the most memory consumed by leaked allocations was by allocations of size 0x186a8, for which chap did not recognize the type:

    chap> summarize leaked
    Unrecognized allocations have 7020 instances taking 0x271c4f10(656,166,672) bytes.
       Unrecognized allocations of size 0x186a8 have 6561 instances taking 0x271c17a8(656,152,488) bytes.
       Unrecognized allocations of size 0x18 have 354 instances taking 0x2130(8,496) bytes.
       Unrecognized allocations of size 0x28 have 53 instances taking 0x848(2,120) bytes.
       Unrecognized allocations of size 0x38 have 39 instances taking 0x888(2,184) bytes.
       Unrecognized allocations of size 0x88 have 7 instances taking 0x3b8(952) bytes.
       Unrecognized allocations of size 0x48 have 6 instances taking 0x1b0(432) bytes.
    Signature 7f88fe8500a0 has 2 instances taking 0x70(112) bytes.
    Signature 7f88fce30930 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce31610 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce31690 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce31760 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce317e0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce31860 has 1 instances taking 0x28(40) bytes.
    Signature 7f88fce32370 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce32450 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce32460 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce32470 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce34f40 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce36260 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce362e0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce36e20 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce36ea0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce37630 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce376b0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce37cc0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce39830 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce398e0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce39990 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce39a10 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce39a90 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce39b10 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce39e30 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3a180 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3a250 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3a2d0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3b8e0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3b960 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3c6d0 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3c830 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3c960 has 1 instances taking 0x18(24) bytes.
    Signature 7f88fce3cab0 has 1 instances taking 0x18(24) bytes.
    7056 allocations use 0x271c52c0 (656,167,616) bytes.
    

    There are several ways to understand a leaked allocation that chap doesn't recognize. One is to look at it for clues. In this case we are interested mostly in those allocations of size 0x186a8. We can sample them and look at the start of the elements in the sample:

    chap> describe leaked /size 186a8 /geometricSample 100 /showUpTo 100 /showAscii true
    Unreferenced allocation at 2afbd80 of size 186a8
    
     0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    20: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    40: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    60: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    80: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    a0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    c0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    e0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    Unreferenced allocation at 3f1c090 of size 186a8
    
     0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    20: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    40: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    60: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    80: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    a0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    c0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    e0: 7878787878787878 7878787878787878 7878787878787878 7878787878787878   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    2 allocations use 0x30d50 (200,016) bytes.
    

    Given the run of 'x' in the allocations and that 0x186a8 = 100008, which is reasonable as the size of an allocation that libc malloc would return as the result of a request for 100,000 bytes, we can reasonably guess that the following statement is the one causing the leak:

        label.props.label = 100000 * "x"
    

    To fix the leak, one can change the above line to the following:

        label.set_label(100000 * "x")
    

    If you run the modified program, you will note that it does not grow. Using chap on two cores from a single process running that modified program shows that there is still a mild leak, but it apparently happens only at startup. Here is what chap reports as leaked for both cores from the new program:

    chap> summarize leaked
    Unrecognized allocations have 464 instances taking 0x35a0(13,728) bytes.
       Unrecognized allocations of size 0x18 have 385 instances taking 0x2418(9,240) bytes.
       Unrecognized allocations of size 0x28 have 43 instances taking 0x6b8(1,720) bytes.
       Unrecognized allocations of size 0x38 have 22 instances taking 0x4d0(1,232) bytes.
       Unrecognized allocations of size 0x88 have 7 instances taking 0x3b8(952) bytes.
       Unrecognized allocations of size 0x48 have 6 instances taking 0x1b0(432) bytes.
       Unrecognized allocations of size 0x98 have 1 instances taking 0x98(152) bytes.
    Pattern %ContainerPythonObject has 31 instances taking 0x618(1,560) bytes.
       Matches of size 0x38 have 20 instances taking 0x460(1,120) bytes.
       Matches of size 0x28 have 11 instances taking 0x1b8(440) bytes.
    495 allocations use 0x3bb8 (15,288) bytes.
    

    One could track down the remaining leaks but they don't seem all that interesting because they appear to happen only at startup and don't take all that much memory.