Here is my test code:
#! /usr/bin/python3
import gc
import ctypes
name = "a" * 50
name_id = id(name)
del name
gc.collect()
print(ctypes.cast(name_id, ctypes.py_object).value)
output:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
In my opinion, gc.collect()
should clean the variable name
and it's value,
but why can I get value with name_id
after gc.collect()
?
You shouldn't expect gc.collect()
to do anything here. gc
simply controls the cyclic garbage collector, which is an auxilliary garbage collector because CPython uses reference counting for its main memory management strategy. The cyclic garbage collector handles reference cycles, there are no reference cycles here so gc.collect
won't do anything.
In my opinion, gc.collect() should clean the variable name and it's value,
That is simply not how Python works. The variable ceased to exist with del name
, but the object continues to exist, in this case, due to compiler optimizations. Python variables are not like C variables, they aren't chunks of memory, they are names that refer to objects in a particular namespace.
In any case, disassembling the code will give you some insight here:
In [1]: import dis
In [2]: dis.dis("""
...: import gc
...: import ctypes
...:
...: name = "a" * 50
...: name_id = id(name)
...: del name
...: gc.collect()
...: print(ctypes.cast(name_id, ctypes.py_object).value)
...: """)
2 0 LOAD_CONST 0 (0)
2 LOAD_CONST 1 (None)
4 IMPORT_NAME 0 (gc)
6 STORE_NAME 0 (gc)
3 8 LOAD_CONST 0 (0)
10 LOAD_CONST 1 (None)
12 IMPORT_NAME 1 (ctypes)
14 STORE_NAME 1 (ctypes)
5 16 LOAD_CONST 2 ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
18 STORE_NAME 2 (name)
6 20 LOAD_NAME 3 (id)
22 LOAD_NAME 2 (name)
24 CALL_FUNCTION 1
26 STORE_NAME 4 (name_id)
7 28 DELETE_NAME 2 (name)
8 30 LOAD_NAME 0 (gc)
32 LOAD_METHOD 5 (collect)
34 CALL_METHOD 0
36 POP_TOP
9 38 LOAD_NAME 6 (print)
40 LOAD_NAME 1 (ctypes)
42 LOAD_METHOD 7 (cast)
44 LOAD_NAME 4 (name_id)
46 LOAD_NAME 1 (ctypes)
48 LOAD_ATTR 8 (py_object)
50 CALL_METHOD 2
52 LOAD_ATTR 9 (value)
54 CALL_FUNCTION 1
56 POP_TOP
58 LOAD_CONST 1 (None)
60 RETURN_VALUE
So, when your code block was compiled, the CPython compiler noticed that "a"*50
could be turned into a constant, and so it did. It stores constants for code objects until that code object doesn't exist any more (in this case, when the interpreter exist). Since this code object will maintain a reference to this string object, it will exist the entire time.
So, more explicitely:
In [4]: code = compile("""name = "a" * 50""", filename='foo', mode='exec')
In [5]: code
Out[5]: <code object <module> at 0x7ff7c12495d0, file "foo", line 1>
In [6]: code.co_consts
Out[6]: ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', None)
Note also that Python memory management is complex and pretty opaque. All objects are handled on a privately managed heap. Just because an object is "released" doesn't mean that the runtime won't simply re-used that bit of memory for objects of the same type (or other suitable types) as needed. Look at this:
In [1]: class Foo: pass
In [2]: import ctypes
In [3]: foo = Foo()
In [4]: id(foo)
Out[4]: 140559250737552
In [5]: del foo
In [6]: foo2 = Foo()
In [7]: id(foo2)
Out[7]: 140559250737680
In [8]: ctypes.cast(140559250737552, ctypes.py_object).value
Out[8]: <prompt_toolkit.lexers.pygments.RegexSync at 0x7fd68035c990>
In [9]: id(foo2)
Out[9]: 140559250737680
In [10]: del foo2
In [11]: ctypes.cast(140559250737680, ctypes.py_object).value
Out[11]: <prompt_toolkit.lexers.pygments.PygmentsLexer at 0x7fd68035ca10>
Notice how you are able to recover some objects in these cases, because the ipython interactive shell is creating objects all the time, and the internal heap is happy to re-use that memory.
Look what happens in a more bare-bones REPL:
(base) juanarrivillaga@50-254-139-253-static% python
Python 3.7.9 (default, Aug 31 2020, 07:22:35)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> class Foo: pass
...
>>> foo = Foo()
>>> i = id(foo)
>>> del foo
>>> ctypes.cast(i, ctypes.py_object).value
zsh: segmentation fault python
So yeah. More what one might expect, we tried to access a part of memory that had been not only reclaimed by the internal heap, but freed by the Python process, and thus, we got a segmentation fault.