Search code examples
pythonpython-2.7python-3.xdifferenceiterable-unpacking

What is with this change of unpacking behavior from Python2 to Python3


Yesterday I came across this odd unpacking difference between Python 2 and Python 3, and did not seem to find any explanation after a quick Google search.

Python 2.7.8

a = 257
b = 257
a is b # False

a, b = 257, 257
a is b # False

Python 3.4.2

a = 257
b = 257
a is b # False

a, b = 257, 257
a is b # True

I know it probably does not affect the correctness of a program, but it does bug me a little. Could anyone give some insights about this difference in unpacking?


Solution

  • This behaviour is at least in part to do with how the interpreter does constant folding and how the REPL executes code.

    First, remember that CPython first compiles code (to AST and then bytecode). It then evaluates the bytecode. During compilation, the script looks for objects that are immutable and caches them. It also deduplicates them. So if it sees

    a = 257
    b = 257
    

    it will store a and b against the same object:

    import dis
    
    def f():
        a = 257
        b = 257
    
    dis.dis(f)
    #>>>   4           0 LOAD_CONST               1 (257)
    #>>>               3 STORE_FAST               0 (a)
    #>>>
    #>>>   5           6 LOAD_CONST               1 (257)
    #>>>               9 STORE_FAST               1 (b)
    #>>>              12 LOAD_CONST               0 (None)
    #>>>              15 RETURN_VALUE
    

    Note the LOAD_CONST 1. The 1 is the index into co_consts:

    f.__code__.co_consts
    #>>> (None, 257)
    

    So these both load the same 257. Why doesn't this occur with:

    $ python2
    Python 2.7.8 (default, Sep 24 2014, 18:26:21) 
    >>> a = 257
    >>> b = 257
    >>> a is b
    False
    
    $ python3
    Python 3.4.2 (default, Oct  8 2014, 13:44:52) 
    >>> a = 257
    >>> b = 257
    >>> a is b
    False
    

    ?

    Each line in this case is a separate compilation unit and the deduplication cannot happen across them. It works similarly to

    compile a = 257
    run     a = 257
    compile b = 257
    run     b = 257
    compile a is b
    run     a is b
    

    As such, these code objects will both have unique constant caches. This implies that if we remove the line break, the is will return True:

    >>> a = 257; b = 257
    >>> a is b
    True
    

    Indeed this is the case for both Python versions. In fact, this is exactly why

    >>> a, b = 257, 257
    >>> a is b
    True
    

    returns True as well; it's not because of any attribute of unpacking; they just get placed in the same compilation unit.

    This returns False for versions which don't fold properly; filmor links to Ideone which shows this failing on 2.7.3 and 3.2.3. On these versions, the tuples created do not share their items with the other constants:

    import dis
    
    def f():
        a, b = 257, 257
        print(a is b)
    
    print(f.__code__.co_consts)
    #>>> (None, 257, (257, 257))
    
    n = f.__code__.co_consts[1]
    n1 = f.__code__.co_consts[2][0]
    n2 = f.__code__.co_consts[2][1]
    
    print(id(n), id(n1), id(n2))
    #>>> (148384292, 148384304, 148384496)
    

    Again, though, this is not about a change in how the objects are unpacked; it is only a change in how the objects are stored in co_consts.