Search code examples
pythongarbage-collectionreference-counting

Understanding reference count in python


I'm trying to understand how reference count work in python. I created a variable x and assigned a value of 10 to it. So basically x is pointing to the memory location where object of class int (10) is stored. Now when I try to get reference count of x and 10, I get two different reference counts. If x is pointing to the same memory location where 10 is stored then why do they have different reference counts?

>>> import sys
>>> sys.getrefcount(10)
12
>>> a = 10
>>> sys.getrefcount(10)
13
>>> sys.getrefcount(a)
11

Solution

  • When you directly call sys.getrefcount(10) the call itself increases the reference count. There's one reference for 10 at the call site, and at least one more for reasons I can't exactly recall.

    More detailed answer: When you run a statement in the interactive prompt, that statement is compiled into bytecode, which is then exec'd by the interpreter. The bytecode is stored in a code object, which you can inspect by compiling a statement yourself with the compile() builtin:

    >>> a = 10
    >>> c = compile('sys.getrefcount(10)', '<stdin>', 'single')
    >>> c
    <code object <module> at 0x7f4def343270, file "<stdin>", line 1>
    

    We can use the dis module to inspect the compiled bytecode:

    >>> dis.dis(c)
      1           0 LOAD_NAME                0 (sys)
                  2 LOAD_ATTR                1 (getrefcount)
                  4 LOAD_CONST               0 (10)
                  6 CALL_FUNCTION            1
                  8 PRINT_EXPR
                 10 LOAD_CONST               1 (None)
                 12 RETURN_VALUE
    

    You can see before CALL_FUNCTION is the byte code LOAD_CONST 10. But how does it know 10 is the constant to load? The actual bytecode instruction is LOAD_CONST(0) where 0 is an index into a table of constants which is stored in the code object:

    >>> c.co_consts
    (10, None)
    

    So this is where one of the new references to 10 lives (temporarily).

    Whereas if we do:

    >>> c2 = compile('sys.getrefcount(a)', '<stdin>', 'single')
    >>> dis.dis(c2)
      1           0 LOAD_NAME                0 (sys)
                  2 LOAD_ATTR                1 (getrefcount)
                  4 LOAD_NAME                2 (a)
                  6 CALL_FUNCTION            1
                  8 PRINT_EXPR
                 10 LOAD_CONST               0 (None)
                 12 RETURN_VALUE
    

    Instead of LOAD_CONST there's just LOAD_NAME of whatever a happens to point to. The object 10 itself is not referenced anywhere in the code object.

    Update: The source of the second reference is pretty obscure, but it comes from the AST parser which uses an Arena structure for efficient memory management of AST nodes and the like. The arena also maintains a list (as in an actual Python list) of Python objects parsed in the AST, in the case of numbers that happens here: https://github.com/python/cpython/blob/fee96422e6f0056561cf74fef2012cc066c9db86/Python/ast.c#L2144 (where PyArena_AddPyObject adds the object to said list). IIUC this list exists just to ensure that literals parsed from the AST have at least one reference held somewhere.

    In the actual C code for compiling and running interactive statements the arena isn't freed until after the compiled statement has been executed, at which point that second extra reference goes away.