Search code examples
pythonmemorytuplescpython

Why tuples with the same elements declared in different way results in several memory addresses instead of one?


I am learning about memory management in python. Recently I was exploring differences in memory addresses in mutable and immutable objects. At first I came to the conclusion that the same objects that are also immutable result in only one memory address for optimization purposes. And mutable objects always receive new memory addresses.

But after asking some people and exploring some code snippets I got results that I am unable to understand. Please take a look at the following code:

x = (1, 2)  # 0x111
y = (1, 2)  # 0x111

print(hex(id(x)))
print(hex(id(y)))

a = tuple([1, 2])  # 0x222
b = tuple((1, 2))  # 0x111
c = tuple(range(1, 2))  # 0x333
d = tuple(_ for _ in x)  # 0x444

print(hex(id(a)))
print(hex(id(b)))
print(hex(id(c)))
print(hex(id(d)))

print(x, y, a, b, c, d)  # (1, 2) (1, 2) (1, 2) (1, 2) (1, 2) (1, 2)

From this code, I would expect that all of them should have the same memory address.

Why there is a difference in memory address between those identical tuples?

EDIT: I should also state that I would like to discuss the process in IDE, not python console.


Solution

  • When you call tuple([1, 2]) (or anything that isn't a no-op; after all, calling tuple((1, 2)) can just return the same tuple), Python won't just go looking through all the tuples it has constructed before so it could return the same thing. (If you wanted something like that, you'd use @functools.cache). That's why tuple([1, 2]) and all of the others return new objects with new ids.

    Additionally, when you have Python (well, CPython, since this is an implementation detail) load a module (e.g. from disk), it's compiled into bytecode, and that process does include an optimization or two.

    If we call the dis disassembler on your code, we see it begins with

    $ python3.10 -m dis so75512629.py
      1           0 LOAD_CONST               0 ((1, 2))
                  2 STORE_NAME               0 (x)
    
      2           4 LOAD_CONST               0 ((1, 2))
                  6 STORE_NAME               1 (y)
    

    i.e. the same zeroth const is saved into two different names, and later on

      8          52 LOAD_NAME                5 (tuple)
                 54 LOAD_CONST               0 ((1, 2))
                 56 CALL_FUNCTION            1
                 58 STORE_NAME               7 (b)
    

    we're calling tuple on that same const (and as discussed above, tuple(t) for a tuple t is a no-op).

    This doesn't hold when entering separate suites in a REPL session, as they're compiled separately:

    >>> a = (1, 2)
    >>> b = (1, 2)
    >>> id(a)
    4318599488
    >>> id(b)
    4319249472
    >>>
    

    However, the optimization does apply within a single compiled suite:

    >>> c = [(1, 2), (1, 2)]
    >>> id(c[0])
    4319257472
    >>> id(c[1])
    4319257472