Search code examples
pythonbytecodepython-internals

Why do generator expressions and dict/set comprehensions in Python 2 use a nested function unlike list comprehensions?


List comprehensions have their code placed directly in the function where they are used, like this:

>>> dis.dis((lambda: [a for b in c]))
  1           0 BUILD_LIST               0
              3 LOAD_GLOBAL              0 (c)
              6 GET_ITER            
        >>    7 FOR_ITER                12 (to 22)
             10 STORE_FAST               0 (b)
             13 LOAD_GLOBAL              1 (a)
             16 LIST_APPEND              2
             19 JUMP_ABSOLUTE            7
        >>   22 RETURN_VALUE        

Whereas generator expressions and dict/set comprehensions are mostly placed in a separate nested function, like this:

>>> dis.dis((lambda: {a for b in c}))
  1           0 LOAD_CONST               1 (<code object <setcomp> at 0x7ff41a3d59b0, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (c)
              9 GET_ITER            
             10 CALL_FUNCTION            1
             13 RETURN_VALUE        

>>> dis.dis((lambda: {a for b in c}).func_code.co_consts[1])
  1           0 BUILD_SET                0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                12 (to 21)
              9 STORE_FAST               1 (b)
             12 LOAD_GLOBAL              0 (a)
             15 SET_ADD                  2
             18 JUMP_ABSOLUTE            6
        >>   21 RETURN_VALUE        

In Python 3, all of these are placed in a nested function.

Why is the code placed in a separate nested function? I vaguely remember reading something about people wanting to fix comprehension and/or genexpr variables spilling to the surrounding scope a long time ago, was this the fix for that or something?

Why are list comprehensions implemented differently from the rest in Python 2? Because of backwards compatibility? (I thought I heard the talk about spillage fixing a lot after generator expressions were introduced, though, but I might have just been reading really old discussions, or something)


Solution

  • Yes, you are correct. In Python 3.x, this is introduced to fix the variable leakage. Quoting from the post of History of Python blog, supposedly written by the BDFL himself,

    We also made another change in Python 3, to improve equivalence between list comprehensions and generator expressions. In Python 2, the list comprehension "leaks" the loop control variable into the surrounding scope:

    x = 'before'
    a = [x for x in 1, 2, 3]
    print x # this prints '3', not 'before'
    

    This was an artifact of the original implementation of list comprehensions; it was one of Python's "dirty little secrets" for years. It started out as an intentional compromise to make list comprehensions blindingly fast, and while it was not a common pitfall for beginners, it definitely stung people occasionally. For generator expressions we could not do this. Generator expressions are implemented using generators, whose execution requires a separate execution frame. Thus, generator expressions (especially if they iterate over a short sequence) were less efficient than list comprehensions.

    However, in Python 3, we decided to fix the "dirty little secret" of list comprehensions by using the same implementation strategy as for generator expressions. Thus, in Python 3, the above example (after modification to use print(x) :-) will print 'before', proving that the 'x' in the list comprehension temporarily shadows but does not override the 'x' in the surrounding scope.

    All your questions are answered by the highlighted text.