Search code examples
pythonpython-3.xlist-comprehensionpython-internalsgenerator-expression

Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?


In Python 3, is a list comprehension simply syntactic sugar for a generator expression fed into the list function?

e.g. is the following code:

squares = [x**2 for x in range(1000)]

actually converted in the background into the following?

squares = list(x**2 for x in range(1000))

I know the output is identical, and Python 3 fixes the surprising side-effects to surrounding namespaces that list comprehensions had, but in terms of what the CPython interpreter does under the hood, is the former converted to the latter, or are there any difference in how the code gets executed?

Background

I found this claim of equivalence in the comments section to this question, and a quick google search showed the same claim being made here.

There was also some mention of this in the What's New in Python 3.0 docs, but the wording is somewhat vague:

Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.


Solution

  • Both work differently. The list comprehension version takes advantage of the special bytecode LIST_APPEND which calls PyList_Append directly for us. Hence it avoids an attribute lookup to list.append and a function call at the Python level.

    >>> def func_lc():
        [x**2 for x in y]
    ...
    >>> dis.dis(func_lc)
      2           0 LOAD_CONST               1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>)
                  3 LOAD_CONST               2 ('func_lc.<locals>.<listcomp>')
                  6 MAKE_FUNCTION            0
                  9 LOAD_GLOBAL              0 (y)
                 12 GET_ITER
                 13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 16 POP_TOP
                 17 LOAD_CONST               0 (None)
                 20 RETURN_VALUE
    
    >>> lc_object = list(dis.get_instructions(func_lc))[0].argval
    >>> lc_object
    <code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>
    >>> dis.dis(lc_object)
      2           0 BUILD_LIST               0
                  3 LOAD_FAST                0 (.0)
            >>    6 FOR_ITER                16 (to 25)
                  9 STORE_FAST               1 (x)
                 12 LOAD_FAST                1 (x)
                 15 LOAD_CONST               0 (2)
                 18 BINARY_POWER
                 19 LIST_APPEND              2
                 22 JUMP_ABSOLUTE            6
            >>   25 RETURN_VALUE
    

    On the other hand the list() version simply passes the generator object to list's __init__ method which then calls its extend method internally. As the object is not a list or tuple, CPython then gets its iterator first and then simply adds the items to the list until the iterator is exhausted:

    >>> def func_ge():
        list(x**2 for x in y)
    ...
    >>> dis.dis(func_ge)
      2           0 LOAD_GLOBAL              0 (list)
                  3 LOAD_CONST               1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>)
                  6 LOAD_CONST               2 ('func_ge.<locals>.<genexpr>')
                  9 MAKE_FUNCTION            0
                 12 LOAD_GLOBAL              1 (y)
                 15 GET_ITER
                 16 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 22 POP_TOP
                 23 LOAD_CONST               0 (None)
                 26 RETURN_VALUE
    >>> ge_object = list(dis.get_instructions(func_ge))[1].argval
    >>> ge_object
    <code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>
    >>> dis.dis(ge_object)
      2           0 LOAD_FAST                0 (.0)
            >>    3 FOR_ITER                15 (to 21)
                  6 STORE_FAST               1 (x)
                  9 LOAD_FAST                1 (x)
                 12 LOAD_CONST               0 (2)
                 15 BINARY_POWER
                 16 YIELD_VALUE
                 17 POP_TOP
                 18 JUMP_ABSOLUTE            3
            >>   21 LOAD_CONST               1 (None)
                 24 RETURN_VALUE
    >>>
    

    Timing comparisons:

    >>> %timeit [x**2 for x in range(10**6)]
    1 loops, best of 3: 453 ms per loop
    >>> %timeit list(x**2 for x in range(10**6))
    1 loops, best of 3: 478 ms per loop
    >>> %%timeit
    out = []
    for x in range(10**6):
        out.append(x**2)
    ...
    1 loops, best of 3: 510 ms per loop
    

    Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.

    >>> %%timeit
    out = [];append=out.append
    for x in range(10**6):
        append(x**2)
    ...
    1 loops, best of 3: 467 ms per loop
    

    Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:

    >>> [x**2 for x in 1, 2, 3] # Python 2
    [1, 4, 9]
    >>> [x**2 for x in 1, 2, 3] # Python 3
      File "<ipython-input-69-bea9540dd1d6>", line 1
        [x**2 for x in 1, 2, 3]
                        ^
    SyntaxError: invalid syntax
    
    >>> [x**2 for x in (1, 2, 3)] # Add parenthesis
    [1, 4, 9]
    >>> for x in 1, 2, 3: # Python 3: For normal loops it still works
        print(x**2)
    ...
    1
    4
    9