I don't have a code example, but I'm curious whether it's possible to write Python code that results in essentially a memory leak.
It is possible, yes.
It depends on what kind of memory leak you are talking about. Within pure Python code, it's not possible to "forget to free" memory such as in C, but it is possible to leave a reference hanging somewhere. Some examples of such:
An unhandled traceback object that is keeping an entire stack frame alive, even though the function is no longer running
while game.running():
try:
key_press = handle_input()
except SomeException:
etype, evalue, tb = sys.exc_info()
# Do something with tb like inspecting or printing the traceback
In this silly example of a game loop maybe, we assigned tb
to a local. We had good intentions, but this tb
contains frame information about the stack of whatever was happening in our handle_input all the way down to potentially very deep calls, and anything in those stacks. Presuming your game continues, this 'tb' is kept alive even in your next call to handle_input, and maybe forever. The docs for exc_info now talk about this potential circular reference issue and recommend simply not assigning tb
if you don't absolutely need it. If you only need to get a traceback consider e.g. traceback.format_exc
instead.
Storing values in a class or global scope instead of instance scope, and not realizing it.
This one can happen in insidious ways, but often happens when you define mutable types in your class scope.
class Money:
name = ''
symbols = [] # This is the dangerous line here
def set_name(self, name):
self.name = name
def add_symbol(self, symbol):
self.symbols.append(symbol)
In the above example, say you did
m = Money()
m.set_name('Dollar')
m.add_symbol('$')
You'll probably find this particular bug quickly. What happened is in this case you put a mutable value at class scope and even though you correctly access it at instance scope, it's actually "falling through" to the class object's __dict__
.
This used in certain contexts could potentially cause your application's heap to grow forever, and would cause issues in say, a production web application that didn't restart its processes occasionally.
Cyclic references in classes which also have a __del__
method.
Authors Note - As of Python 3.4, this issue is mostly solved by PEP-0442
Ironically, the existence of a __del__
made it impossible (in Python 2 & early versions of Python 3) for the cyclic garbage collector to clean an instance up. Say you had something where you wanted to do a destructor for finalization purposes:
class ClientConnection:
def __del__(self):
if self.socket is not None:
self.socket.close()
self.socket = None
Now this works fine on its own, and you may be led to believe it's being a good steward of OS resources to ensure the socket is 'disposed' of.
However, if ClientConnection kept a reference to say, User
and User kept a reference to the connection, you might be tempted to say that on cleanup, let's have user de-reference the connection. This is actually the flaw, however: the cyclic GC doesn't know the correct order of operations and cannot clean it up.
The solution to this is to ensure you do cleanup on say, disconnect events by calling some sort of close, but name that method something other than __del__
.
Poorly implemented C extensions, or not properly using C libraries as they are designed.
In Python, you trust in the garbage collector to throw away things you aren't using. But if you use an extension that wraps a C library, the majority of the time you are responsible for making sure you explicitly close or de-allocate resources. Mostly this is documented, but a Python programmer who is used to not having to do this explicit de-allocation might throw away the handle to that library or an object within without knowing that resources are being held.
Scopes which contain closures that contain a whole lot more than you could've anticipated
class User:
def set_profile(self, profile):
def on_completed(result):
if result.success:
self.profile = profile
self._db.execute(
change={'profile': profile},
on_complete=on_completed
)
In this contrived example, we appear to be using some sort of 'async' call that will call us back at on_completed
when the DB call is done (the implementation could've been promises, it ends up with the same outcome).
What you may not realize is that the on_completed
closure binds a reference to self
in order to execute the self.profile
assignment. Now, perhaps the DB client keeps track of active queries and pointers to the closures to call when they're done (since it's async) and say it crashes for whatever reason. If the DB client doesn't correctly cleanup callbacks etc, in this case, the DB client now has a reference to on_completed which has a reference to User which keeps a _db
- you've now created a circular reference that may never get collected.
(Even without a circular reference, the fact that closures bind locals and even instances sometimes may cause values you thought were collected to be living for a long time, which could include sockets, clients, large buffers, and entire trees of things)
Default parameters which are mutable types
def foo(a=[]):
a.append(time.time())
return a
This is a contrived example, but one could be led to believe that the default value of a
being an empty list means append to it, when it is in fact a reference to the same list. This again similar to the earlier Money
example might cause unbounded growth without knowing that you did that.
(Note from August 2023 Update: This post was originally written in 2010 and the information within is still largely valid today, I just did some minor updates to the URL references and made sure the code examples are valid in both Python 2 & Python 3)