Search code examples
pythongarbage-collection

How to prevent premature deletion of built-in functions by garbage collector on application exit


Here is a short script with a class that dumps some data to disk upon destruction:

import pickle

class MyClass(object):

    def __init__(self):
        self._some_data = "lorem ipsum dolor sit"

    def __del__(self):
        with open("/some/file/to/dump/data.pickle", 'wb') as ofile:
            pickle.dump(self._some_data, ofile)

my_instance = MyClass()
# script ends, causing garbage collector to clean up and delete my_instance, which triggers __del__()

The issue is that __del__() uses the built-in open() function, which may be deleted by the garbage collector on application exit before my_instance is deleted. This then causes an "NameError: name 'open' is not defined" error. This reason for this problem is described in more detail here (but without solution): NameError: name 'open' is not defined When trying to log to files

However, when playing around I found a workaround by overwriting the open function with itself by adding the silly looking line open = open:

import pickle
open = open


class MyClass(object):

    def __init__(self):
        self._some_data = "lorem ipsum dolor sit"

    def __del__(self):
        with open("/some/file/to/dump/data.pickle", 'wb') as ofile:
            pickle.dump(self._some_data, ofile)


my_instance = MyClass()

This will now work and dump self._some_data to disk.

I have multiple questions:

  1. Why precisely does this actually work? My first instinct was that MyClass is depending on the local open(), so the garbage collector will delete open() after my_instance. However, in that case i do not understand why the first version does not work because similarly, it is depending on the built-in open(), which should also prevent its premature deletion.
  2. Is this workaround actually reliable? I might just have been lucky and that in another version of python this will/would not work.
  3. Is there a better way to solve this problem? While there is some elegance in the simplicity of this workaround, this also looks very hacky.

Solution

  • The assumption that MyClass (or it's instances) somehow "depends" on open is not entirely correct. What actually happens in __del__ (or any other method or function) is that at the point where open is called, the interpreter will look-up the name "open" in the global namespace, and use whatever object it finds there. As such, the "dependency" on open is only resolved at the time the open method is called, and only for the local context. This usually works out fine, unless finalization has already begun, and what one can ordinarily expect to be in globals() is no longer there.

    With respect to you question:

    1. The line open = open does not "overwrite" the function with itself. The line creates a new binding inside the module's namespace, called "open" (the first "open" in open = ...); the value assigned to that binding is what the interpreter will get when looking up the name "open" (the second "open" in ... = open) in the global namespace. That value is - no surprises - the built-in open-function, which is now bound to both the local module's namespace under the name "open" and the global namespace under the (same) name "open".
      When your __del__-method calls open, the interpreter searches the local module's namespace for the name "open", falling back to the global namespace as described above. That extra binding in the module's namespace is what keeps the open function from being destroyed, as it is not only bound in the global, but also in the local module's namespace. As such, there are two paths to reach the function's implementation, and even if the set of built-in bindings is being destroyed, the local module's namespace has a path to it that is still reachable.

    2. Relying on __del__ being executed at all is never a good idea. There are many reasons for that. In general, the destructor should only care about destroying the object and releasing resources that would otherwise be wasted / blocked if the program continues. Yet you should always assume that the destructor does not execute at all (think abort()).

    3. The atexit module provides a more reliable way to achieve what you want.