Search code examples
pythonlist-comprehensionpython-exec

listcomp unable to access locals defined in code called by exec if nested in function


Are there any python gurus out there able to explain why this code doesn't work :

def f(code_str):
    exec(code_str)

code = """
g = 5
x = [g for i in range(5)]
"""

f(code)

Error:

Traceback (most recent call last):
  File "py_exec_test.py", line 9, in <module>
    f(code)
  File "py_exec_test.py", line 2, in f
    exec(code_str)
  File "<string>", line 3, in <module>
  File "<string>", line 3, in <listcomp>
NameError: name 'g' is not defined

while this one works fine:

code = """
g = 5
x = [g for i in range(5)]
"""

exec(code)

I know it has something to do with locals and globals, as if I pass the exec function the locals and globals from my main scope it works fine, but I don't exactly understand what is going on.

Could it be a bug with Cython?

EDIT: Tried this with python 3.4.0 and python 3.4.3


Solution

  • The problem is because the list comprehension is closureless in the exec().

    When you make a function (in this case a list comprehension) outside of an exec(), the parser builds a tuple with the free variables (the variables used by a code block but not defined by it, ie. g in your case). This tuple is called the function's closure. It is kept in the __closure__ member of the function.

    When in the exec(), the parser won't build a closure on the list comprehension and instead tries by default to look into the globals() dictionary. This is why adding global g at the beginning of the code will work (as well as globals().update(locals())).

    Using the exec() in its two parameter version will also solve the problem: Python will merge the globals() and locals() dictionary in a single one (as per the documentation). When an assignation is performed, it is done in the globals and locals at the same time. Since Python will check in the globals, this approach will work.

    Here's another view on the problem:

    import dis
    
    code = """
    g = 5
    x = [g for i in range(5)]
    """
    
    a = compile(code, '<test_module>', 'exec')
    dis.dis(a)
    print("###")
    dis.dis(a.co_consts[1])
    

    This code produces this bytecode:

      2           0 LOAD_CONST               0 (5)
                  3 STORE_NAME               0 (g)
    
      3           6 LOAD_CONST               1 (<code object <listcomp> at 0x7fb1b22ceb70, file "<boum>", line 3>)
                  9 LOAD_CONST               2 ('<listcomp>')
                 12 MAKE_FUNCTION            0
                 15 LOAD_NAME                1 (range)
                 18 LOAD_CONST               0 (5)
                 21 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 24 GET_ITER
                 25 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 28 STORE_NAME               2 (x)
                 31 LOAD_CONST               3 (None)
                 34 RETURN_VALUE
    ###
      3           0 BUILD_LIST               0
                  3 LOAD_FAST                0 (.0)
            >>    6 FOR_ITER                12 (to 21)
                  9 STORE_FAST               1 (i)
                 12 LOAD_GLOBAL              0 (g)      <---- THIS LINE
                 15 LIST_APPEND              2
                 18 JUMP_ABSOLUTE            6
            >>   21 RETURN_VALUE
    

    Notice how it performs a LOAD_GLOBAL to load g at the end.

    Now, if you have this code instead:

    def Foo():
        a = compile(code, '<boum>', 'exec')
        dis.dis(a)
        print("###")
        dis.dis(a.co_consts[1])
        exec(code)
    
    Foo()
    

    This will provide exactly the same bytecode, which is problematic: since we're in a function, g won't be declared in the global variable, but in the locals of the function. But Python tries to search it in the global variables (with LOAD_GLOBAL)!

    This is what the interpreter does outside of exec():

    def Bar():
        g = 5
        x = [g for i in range(5)]
    
    dis.dis(Bar)
    print("###")
    dis.dis(Bar.__code__.co_consts[2])
    

    This code gives us this bytecode:

    30           0 LOAD_CONST               1 (5)
                 3 STORE_DEREF              0 (g)
    
    31           6 LOAD_CLOSURE             0 (g)
                  9 BUILD_TUPLE              1
                 12 LOAD_CONST               2 (<code object <listcomp> at 0x7fb1b22ae030, file "test.py", line 31>)
                 15 LOAD_CONST               3 ('Bar.<locals>.<listcomp>')
                 18 MAKE_CLOSURE             0
                 21 LOAD_GLOBAL              0 (range)
                 24 LOAD_CONST               1 (5)
                 27 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 30 GET_ITER
                 31 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 34 STORE_FAST               0 (x)
                 37 LOAD_CONST               0 (None)
                 40 RETURN_VALUE
    ###
     31           0 BUILD_LIST               0
                  3 LOAD_FAST                0 (.0)
            >>    6 FOR_ITER                12 (to 21)
                  9 STORE_FAST               1 (i)
                 12 LOAD_DEREF               0 (g)      <---- THIS LINE
                 15 LIST_APPEND              2
                 18 JUMP_ABSOLUTE            6
            >>   21 RETURN_VALUE
    

    As you can see, g is loaded using LOAD_DEREF, available in the tuple generated in the BUILD_TUPLE, that loaded the variable g using LOAD_CLOSURE. The MAKE_CLOSURE statement creates a function, just like MAKE_FUNCTION seen earlier, but with a closure.

    Here's my guess on the reason it is this this way: The closures are created when needed when the module is read the first time. When exec() is executed, it is not able to realize the functions defined within its executed code needs closure. For him, the code in its string that doesn't begin with an indentation is in the global scope. The only way to know if he was invoked in a way that requires a closure would require exec() to inspect the current scope (which seems pretty hackish to me).

    This is indeed an obscure behavior which may be explained but certainly raises some eyebrows when it happens. It is a side-effect well explained in the Python guide, though it is hard to understand why it applies to this particular case.

    All my analysis was made on Python 3, I have not tried anything on Python 2.