Search code examples
pythongio

Gio.MemoryInputStream does not free memory when closed


Running Python 3.4 on Windows 7, the close function of Gio.MemoryInputStream does not free the memory, as it should. The test code is :

from gi.repository import Gio
import os, psutil

process = psutil.Process(os.getpid())

for i in range (1,10) :
    input_stream = Gio.MemoryInputStream.new_from_data(b"x" * 10**7)
    x = input_stream.close_async(2)
    y = int(process.memory_info().rss / 10**6)  # Get the size of memory used by the program
    print (x, y)

This returns :

True 25
True 35
True 45
True 55
True 65
True 75
True 85
True 95
True 105

This shows that on each loop, the memory used by the program increases of 10 MB, even if the close function returned True. How is it possible to free the memory, once the Stream is closed ?

Another good solution would be to reuse the stream. But set_data or replace_data raises the following error : 'Data access methods are unsupported. Use normal Python attributes instead' Fine, but which property ?

I need a stream in memory in Python 3.4. I create a Pdf File with PyPDF2, and then I want to preview it with Poppler. Due to a bug in Poppler (see Has anyone been able to use poppler new_from_data in python?) I cannot use the new_from_data function and would like to use the new_from_stream function.


Solution

  • This is a bug in GLib’s Python bindings which can’t be trivially fixed.

    Instead, you should use g_memory_input_stream_new_from_bytes(), which handles freeing memory differently, and shouldn’t suffer from the same bug.


    In more detail, the bug with new_from_data() is caused by the introspection annotations, which GLib uses to allow language bindings to automatically expose all of its API, not supporting the GDestroyNotify parameter for new_from_data() which needs to be set to a non-NULL function to free the allocated memory which is passed in to the other arguments. Running your script under gdb shows that pygobject passes NULL to the GDestroyNotify parameter. It can’t do any better, since there is currently no way of expressing that the memory management semantics of the data parameter depend on what’s passed to destroy.