Search code examples
pythonmemory-managementgarbage-collectioncython

Garbage collection while running a binary executable progam which is converted from python using cython


I have converted a python program to binary using cython. The converted python program is calling some c-functions as well.

What I have noticed is that when the program is continuously running, the memory consumption is steadily increasing and after some threshold it drops ,and then it increases and the cycle repeats.

I may be wrong, but it seems to me that this type of behaviour happens only with the garbage collector. How is the garbage collector coming into picture in a binary program?

If the standalone binary, that was built from a python program, runs outside the python interpreter, then what does invoke the garbage collector?


Solution

  • I think this question is based on a few misunderstandings of what Cython actually does.

    The code Cython generates is not standalone. Instead, large chunks of it work by calling functions in libpython. These functions are exactly the same functions as the Python interpreter calls when interpreting normal Python code. Therefore, you should expect that things like the garbage collector will behave in exactly the same way. The majority of variables in Cython are still PyObjects - exactly the same structures as Python uses, and these are still allocated in the same way: by calling PyObject_New. The documentation describes the garbage collector trigger as:

    When the number of allocations minus the number of deallocations exceeds threshold0, collection starts

    and since Cython uses the same allocation and deallocation mechanism for PyObjects as interpretted Python code, there's no reason to believe this will behave any differently.

    In addition, you should actually expect many Python programs to be running significant amounts of interpreted Python code anyway. If you import a module in Cython then that module will be run in the standard way. If that module is a normal Python module this will be run in the interpreter. You really can't separate Cython from libpython or the Python interpreter (which is part of libpython).


    Finally, it isn't at all clear that the behaviour you describe is the garbage collector: it could simply be the normal reference counting mechanism. It'd be pretty usual for a single object to hold references to large numbers of other objects. When that single object is destroyed (for example in each iteration of a loop) then it would end up freeing most of the other objects it holds references too. If you want to know if it's actually the GC running, then you can look at gc.get_stats() (Python 3.4 upwards)